Analyzing Binary Relationships of Identity Labels Using Distributional Semantic Models

A Critical Queer Linguistic Analysis of an English Language Subreddit


  • Hunter Youngquist



queer linguistics, critical discourse analysis, Distributional Semantic Models, binaries, identity labels, gender & sexuality


Following the shift towards quantitative, corpus-based analysis in queer linguistics, I examine the usage of identity labels to explore the binary relationships and predicted normative effects in the case of the online community r/lgbt, a subreddit dedicated to minority identity labels and discussion.

I analyze the distribution of the most frequent identity labels of the subreddit in a 2-year period with distributional semantic models, vector-based matrices that capture word distributions as numeric representations, showing evidence for various binaries that co-construct each other within the corpus. Additionally, I utilize concordances and collocations to examine the discourses surrounding gender and sexuality in the comments and submissions subcorpora, showing a more queer-aligned perspective in the former and a label-searching perspective in the latter.

Finally, the results from these techniques demonstrate the overall complex relationships between the many types of labels currently in use and between the subreddit users and their feelings about adopting specific labels to describe their identities.


Bachmann, Ingo. “Civil Partnership—‘Gay Marriage in All but Name’: A Corpus-Driven Analysis of Discourses of Same-Sex Relationships in the UK Parliament.” Corpora 6 (2011): 77-105.

Baker, Paul. Public Discourses of Gay Men. New York: Routledge, 2005.

---. Sexed Texts: Language, Gender and Sexuality. City: Equinox Pub., 2008.

---. Using Corpora in Discourse. New York: Continuum, 2006.

Baroni, Marco, et al. “Don’t Count, Predict! A Systematic Comparison of Context-Counting vs. Context-Predicting Semantic Vectors.” Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics (2014): 238-247.

Baumgartner, Jason, et al. “The Pushshift Reddit Dataset.” ArXiv:2001.08435 (2020).

Benoit, Kenneth, et al. “Quanteda: An R Package for the Quantitative Analysis of Textual Data.” Journal of Open Source Software 3.30 (2018): 774.

Brigadir, Igor, et al. “Analyzing Discourse Communities with Distributional Semantic Models.” Proceedings of the 2015 ACM Web Science Conference. Association for Computing Machinery (2015): 1-10.

Bruchansky, Christophe. “Political Footprints: Political Discourse Analysis Using Pre-Trained Word Vectors.” ArXiv:1705.06353 (2017). 1-7.

Butler, Judith. Bodies That Matter: On the Discursive Limits of “Sex.” New York: Routledge, 1993.

---. Gender Trouble: Feminism and the Subversion of Identity. New York: Routledge, 1999.

---. “Imitation and Gender Insubordination.” The New Social Theory Reader, 2nd Edition. London: Routledge, 2008. 13.

---. “Performative Acts and Gender Constitution: An Essay in Phenomenology and Feminist Theory.” Theatre Journal 40.4 (1988): 519-531.

Clark, Stephen. “Vector Space Models of Lexical Meaning.” The Handbook of Contemporary Semantic Theory. Edited by Shalom Lappin and Chris Fox. Oxford: Wiley-Blackwell, 2015. 493-522.

Desmarais, Angela-Marie. Men Who Knit: A Social Media Critical Discourse Study (SM-CDS) on the Legitimisation of Men within Reddit’s r/Knitting Community. Auckland: Auckland University of Technology, 2020.

Erk, Katrin. “Vector Space Models of Word Meaning and Phrase Meaning: A Survey.” Language and Linguistics Compass 6.10 (2012): 635-653.

Evert, Stefan. “Distributional Semantics in R with the Wordspace Package.” Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations. Dublin City University and Association for Computational Linguistics (2014): 110-14.

Hayfield, Nikki and Karolína Křížová. “It’s Like Bisexuality, but It Isn’t: Pansexual and Panromantic People’s Understandings of Their Identities and Experiences of Becoming Educated about Gender and Sexuality.” Journal of Bisexuality 21.2 (2021): 167-193.

Kiela, Douwe and Stephen Clark. “A Systematic Study of Semantic Vector Space Model Parameters.” Proceedings of the 2nd Workshop on Continuous Vector Space Models and Their Compositionality (CVSC). Association for Computational Linguistics (2014): 21-30.

Lapesa, Gabriella and Stefan Evert. “A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection.” Transactions of the Association for Computational Linguistics 2 (2014): 531-546.

LaViolette, Jack and Bernie Hogan. “Using Platform Signals for Distinguishing Discourses: The Case of Men’s Rights and Men’s Liberation on Reddit”. Proceedings of the International AAAI Conference on Web and Social Media. ICWSM (2019): 323-334.

Leap, William. “31 Queer Linguistics as Critical Discourse Analysis.” The Handbook of Discourse Analysis. Edited by Deborah Tannen, Heidi E. Hamilton and Deborah Schiffrin. New York: John Wiley & Sons, 2015. 661-680.

Lenci, Alessandro, et al. “A Comprehensive Comparative Evaluation and Analysis of Distributional Semantic Models.” ArXiv:2105.09825 (May 2021).

---. “Distributional Models of Word Meaning.” Annual Review of Linguistics 4.1 (2018): 151-171.

Lucero, Leanna. “Safe Spaces in Online Places: Social Media and LGBTQ Youth.” Multicultural Education Review 9.2 (2017): 117-128.

Milani, Tommaso M. “Are ‘Queers’ Really ‘Queer’? Language, Identity and Same-Sex Desire in a South African Online Community.” Discourse & Society 24.5 (2013): 615-633.

Motschenbacher, Heiko and Martin Stegu. “Queer Linguistic Approaches to Discourse.” Discourse & Society 24.5 (2013): 519-535.

Oakley, Abigail. “Disturbing Hegemonic Discourse: Nonbinary Gender and Sexual Orientation Labeling on Tumblr.” Social Media + Society 2.3 (2016): 205630511666421.

Orton-Johnson, Kate. “Ethics in Online Research; Evaluating the ESRC Framework for Research Ethics Categorisation of Risk.” Sociological Research Online 15.4 (2010): 126-130.

Peirsman, Yves, et al. “Applying Word Space Models to Sociolinguistics: Religion Names before and after 9/11.” Advances in Cognitive Sociolinguistics. Edited by Dirk Geeraerts, Gitte Kristiansen and Yves Peirsman. New York: De Gruyter Mouton, 2010. 111-137.

Roberts, Lynne D. “Ethical Issues in Conducting Qualitative Research in Online Communities.” Qualitative Research in Psychology 12.3 (2015): 314-325.

Sahlgren, Magnus. “The Distributional Hypothesis.” The Italian Journal of Linguistics 20.1 (2008): 33-54.

Santonocito, Carmen. “LGBT* People in the Speeches of Italian and British PMs: A Corpus-Assisted Critical Discourse Analysis.” Critical Approaches to Discourse Analysis across Disciplines 11.2 (2020): 187-212.