Documenting Academic Language as Used in the Theses Submitted to the University of Modena and Reggio Emilia


  • Marina Bondi
  • Matteo Di Cristofaro



corpus linguistics, eap, academic discourse, academic writing


The article discusses the on-going process for the creation of the MoReThesisCorpus, outlining its major characteristics and offering an account of the considerations and issues involved so far. The corpus, composed of the theses submitted to the University of Modena and Reggio Emilia between 2011 and 2020, is being developed as part of the project CAP (‘Comunicazione Accademica e Professionale;’ Academic and Professional Communication), and is meant to foster research into academic language in a cross-disciplinary discourse perspective, as well as to facilitate the production of educational materials aimed at university students. It aims at supporting the acquisition of discipline-related vocabularies and styles to improve the learning of academic writing through corpus tools and resources, following a data-driven learning approach. Technical details surrounding the acquisition and subsequent processing of the data are discussed, along with considerations on a number of issues pertaining both to computer science and linguistics, directly impinging on the capability of the corpus to correctly support an investigation of academic discourse across different languages and disciplines.


Aguilar-Pérez, Marta and Sarah Khan. “Metadiscourse Use When Shifting from L1 to EMI Lecturing: Implications for Teacher Training.” Innovation in Language Learning and Teaching 16.4-5 (2022): 297-311.

Becker, Maria, Michael Bender and Marcus Müller. “Classifying Heuristic Textual Practices in Academic Discourse: A Deep Learning Approach to Pragmatics.” International Journal of Corpus Linguistics 25.4 (2020): 426-460.

Bier, Ada. “From Effective Lecturing Behaviour to Hidden Cognitions: A Preliminary Model Explaining the Language-Teaching Methodology Interface.” Innovation in Language Learning and Teaching 16.4-5 (2022): 351-365.

Bondi, Marina. “Changing Voices: Authorial Voice in Abstracts.” Abstracts in Academic Discourse. Edited by Marina Bondi and Rosa Lorés Sanz. New York: Peter Lang, 2014. 243-269.

---. “The Discourse Function of Contrastive Connectors in Academic Abstracts.” Discourse Patterns in Spoken and Written Corpora. Edited by Karin Aijmer and Anna-Brita Stenström. Amsterdam: John Benjamins, 2004. 139-156.

Boullosa, Beto, et al. “Integrating Knowledge-Supported Search into the INCEpTION Annotation Platform.” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018): 127-132.

Campagna, Sandra and Virginia Pulcini. “English as a Medium of Instruction in Italian Universities: Linguistic Policies, Pedagogical Implications.” Textus, English Studies in Italy 1 (2014): 173-190.

Chambers, Angela. “What is Data-Driven Learning?” The Routledge Handbook of Corpus Linguistics. Edited by Anne O’Keeffe and Michael McCarthy. Milton Park: Routledge, 2012. 345-358.

Charles, Maggie, Diane Pecorari and Susan Hunston, edited by. Academic Writing: At the Interface of Corpus and Discourse. New York: Continuum, 2009.

Christ, Oliver. “A Modular and Flexible Architecture for an Integrated Corpus Query System.” Proceedings of COMPLEX ’94 (1994): 1-10.

Corcoran, James N., Karen Englander and Laura-Mihaela Muresan, edited by. Pedagogies and Policies for Publishing Research in English: Local Initiatives Supporting International Scholars. New York: Routledge, 2019.

Costa, Francesca and James A. Coleman. “A Survey of English-Medium Instruction in Italian Higher Education.” International Journal of Bilingual Education and Bilingualism 16.1 (2013): 3-19.

Costa, Francesca and Olivia Mair. “Multimodality and Pronunciation in ICLHE (Integration of Content and Language in Higher Education) Training.” Innovation in Language Learning and Teaching 16.4-5 (2022): 281-296.

de Castilho, Richard Eckart, et al. “INCEpTION - Corpus-Based Data Science from Scratch.” Digital Infrastructures for Research (DI4R) 2018 (2018).

Dearden, Julie. English as a Medium of Instruction – A Growing Global Phenomenon. British Council, 2014.

Doiz, Aintzane, et al. English-Medium Instruction at Universities: Global Challenges. Bristol: Multilingual Matters, 2012.

Doiz, Aintzane and David Lasagabaster. “Looking into English-Medium Instruction Teachers’ Metadiscourse: An ELF Perspective.” System 105 (2022): 1-12.

Evert, Stefan and Andrew Hardie. “Twenty-First Century Corpus Workbench: Updating a Query Architecture for the New Millennium.” International Journal of Corpus Linguistics 17.3 (2011): 380-409.

Ferguson, Gibson, Carmen Pérez-Llantada and Ramòn Plo. “English as an International Language of Scientific Publication: A Study of Attitudes.” World Englishes 30.1 (2011): 41-59.

Flowerdew, John. “Attitudes of Journal Editors to Nonnative Speaker Contributions.” TESOL Quarterly 35.1 (2001): 121-150.

---. “Writing for Scholarly Publication in English: The Case of Hong Kong.” Journal of Second Language Writing 8.2 (1999): 123-145.

Flowerdew, John and Pejman Habibie. Introducing English for Research Publication Purposes. London: Routledge, 2022.

Flowerdew, Lynne. “Applying Corpus Linguistics to Pedagogy: A Critical Evaluation.” International Journal of Corpus Linguistics 14.3 (2009): 393-417.

---. “Corpora for Eap Writing.” The Routledge Handbook of Corpora and English Language Teaching and Learning. Edited by Reka R. Jablonkai and Eniko Csomay. London: Routledge, 2022. 234-247.

---. “Corpus-Based Research and Pedagogy in EAP: From Lexis to Genre.” Language Teaching 48.1 (2015): 99-116.

Fortanet-Gómez, Inmaculada. Towards a Multilingual Language Policy. Bristol: Multilingual Matters, 2013.

Greenbaum, Sidney. The Oxford English Grammar. Oxford: Oxford University Press, 1996.

Hardie, Andrew. “CQPweb — Combining Power, Flexibility and Usability in a Corpus Analysis Tool.” International Journal of Corpus Linguistics 17.3 (2012): 380-409.

---. “Modest XML for Corpora: Not a Standard, but a Suggestion.” ICAME Journal 38.1 (2014): 73-103.

Hyland, Ken and Feng (Kevin) Jiang. Academic Discourse and Global Publishing: Disciplinary Persuasion in Changing Times. London: Routledge, 2019.

Hyland, Ken and Marina Bondi, edited by. Academic Discourse across Disciplines. New York: Peter Lang, 2006.

Hynninen, Niina. Language Regulation in English as a Lingua Franca: Focus on Academic Spoken Discourse. Berlin: De Gruyter Mouton, 2016.

Hynninen, Niina and Maria Kuteeva. “‘Good’ and ‘Acceptable’ English in L2 Research Writing: Ideals and Realities in History and Computer Science.” Journal of English for Academic Purposes 30 (2017): 53-65.

Jensen, Christian and Jacob Thøgersen. “Comprehensibility, Lecture Recall and Attitudes in EMI.” Journal of English for Academic Purposes 48 (2020): 1-12.

Kachru, Braj B., Yamuna Kachru and Cecil L. Nelson. The Handbook of World Englishes. Oxford: Blackwell Publishing, 2006.

Kielipankki. “The Korp Corpus Input Format.” Kielipankki The Language Bank of Finland (2021).

Kilgarriff, Adam, et al. “The Sketch Engine.” Proceedings of the 11th EURALEX International Congress (2004). 105-116.

Kilgarriff, Adam, et al. “The Sketch Engine: Ten Years On.” Lexicography 1.1 (2014): 7-36.

Klie, Jan-Christoph. “INCEpTION: Interactive Machine-Assisted Annotation.” Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, 2018. 105.

Klie, Jan-Christoph, et al. “The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation.” Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, Association for Computational Linguistics, 2018. 5-9.

Lillis, Theresa and Mary Jane Curry. Academic Writing in a Global Context: The Politics and Practices of Publishing in English. London: Routledge, 2010.

Mauranen, Anna. Exploring ELF: Academic English Shaped by Non-Native Speakers. Cambridge: Cambridge University Press, 2012.

Mauranen, Anna, Carmen Pérez-Llantada and John M. Swales. “Academic Englishes: A Standardised Knowledge?” The Routledge Handbook of World Englishes, 2nd ed. London: Routledge, 2020. 659-676.

Mauranen, Anna, Niina Hynninen and Elina Ranta. “English as an Academic Lingua Franca: The ELFA Project.” English for Specific Purposes 29.3 (2010): 183-190.

---. “Second Language Acquisition, World Englishes, and English as a Lingua Franca (ELF).” World Englishes 37.1 (2018): 106-119.

Mur-Dueñas, Pilar and Jolanta Šinkūnienė, edited by. Intercultural Perspectives on Research Writing. Amsterdam: John Benjamins, 2018.

---. “Self-Reference in Research Articles across Europe and Asia: A Review of Studies.” Brno Studies in English 1 (2016): 71-92.

Paré, Anthony, Doreen Starke-Meyerring and Lynn McAlpine. “Knowledge and Identity Work in the Supervision of Doctoral Student Writing: Shaping Rhetorical Subjects.” Writing in Knowledge Societies. Edited by Doreen Starke-Meyerring, et al. Anderson: Parlor Press, 2011. 215-236.

Pérez-Llantada, Carmen. “Formulaic Language in L1 and L2 Expert Academic Writing: Convergent and Divergent Usage.” Journal of English for Academic Purposes 14 (2014): 84-94.

Pérez-Llantada, Carmen, Ramón Plo and Gibson R. Ferguson. "“You Don’t Say What You Know, Only What You Can”: The Perceptions and Practices of Senior Spanish Academics Regarding Research Dissemination in English.” English for Specific Purposes 30.1 (2011): 18-30.

Qi, Peng, et al. “Stanza: A Python Natural Language Processing Toolkit for Many Human Languages.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 2020. 101-108.

Römer, Ute and Ann Arbor. “English in Academia: Does Nativeness Matter?” Anglistik: International Journal of English Studies 20.2 (2009): 89-100.

Rozycki, William and Neil H. Johnson. “Non-Canonical Grammar in Best Paper Award Winners in Engineering.” English for Specific Purposes 32.3 (2013): 157-169.

Sano, Hikomaro. “The World’s Lingua Franca of Science.” English Today 18.4 (2002): 45-49.

Suresh, Canagarajah A. “‘Nondiscursive’ Requirements in Academic Publishing, Material Resources of Periphery Scholars, and the Politics of Knowledge Production.” Written Communication 13.4 (1996): 435-472.

Tribble, Christopher. “Corpora and Corpus Analysis: New Windows on Academic Writing.” Academic Discourse. Edited by John Flowerdew. London: Routledge, 2002.

---. “ELFA vs. Genre: A New Paradigm War in EAP Writing Instruction?” Journal of English for Academic Purposes 25 (2017): 30-44.

Wu, Xue, Anna Mauranen and Lei Lei. “Syntactic Complexity in English as a Lingua Franca Academic Writing.” Journal of English for Academic Purposes 43 (2020): 1-13.