My degree thesis was about the development of a Mathematical and Statistical Algorithm that is able to perform, or better to help perform, linguistic analysis on a text so as to discover, identify, and evidence bias in speech.

The algorithm can be used both as a standalone tool or, as I use it daily, in conjunction with Artificial Intelligence tools to broaden its scope and at the same time deepen its discovery power.

The idea of using applied linguistics in real-world situations is a direct consequence of the language analysis itself.

Photo by cottonbro on

Specifically, forensic linguistics, and legal linguistics, refer to the application of linguistic methods and tools to the forensic context of crime investigation, trials, and judicial procedures.

There are principally three areas of application for linguists working in forensic contexts, according to the Centre for Forensic Linguistics of the Aston University (today called Institute for Forensic Linguistics at Aston University):

  • understanding language of the written law,
  • understanding language use in forensic and judicial processes,
  • the provision of linguistic evidence.

If we want to specifically understand what forensic linguistics is applied to, we must also consider what the linguistic analysis helps the investigators and the lawyers with: 

  • forensic discourse analysis (analysis of meaning of words in contracts, statutes, and laws, the search for “ordinary meaning”)
  • sociolinguistic profiling (secret language and code analysis; slang and jargon, sociolinguistic analysis of code-switching, forms of address, convergence and divergence)
  • authorship analysis (establishment of possible authors of documents through grammatical and stylistic analys)
  • forensic phonetics (establishment of the possible authors through specific pronunciation of words or the way these words are being written)

What my algorithm excels at is identifying specific linguistic features that can infer or imply a specific (conscious or unconscious it does not matter) origin or cause for that use.

To be more clear: every time a person speaks or writes uses words that are specific, peculiar to this person’s background, socio-cultural environment, cultural and/or political and/or religious affinity, and psychological state at the moment of the creation of the discourse.

The algorithm identifies these peculiarities by analyzing from a mathematical and statistical point of view, so it points out when the use of a specific word, verb, rhetoric image, metaphor, collocation (the specific use of a word, in a discourse, close to another one, as we will see later on) differs for the standard use in that specific language or dialect or sociolect.

The idea is that it’s possible to create a precise association between a speaker and the way s/he speaks or writes (her/his idiolect). Furthermore, if one collects or has collected, many different samples of both generic and topic-related discourse, it is possible to perform an analysis, today enhanced by the power of Artificial Intelligence, on the idiolect so as to determine its adherence or difference from the sample used as reference.

Forensic linguistic profiling, from this point of view, may be used to detect or infer specific attributes of the speaker from his or her linguistic characteristics.

Experts using linguistics, and today using it in association with Artificial Intelligence, are able to discover several types of speaker characteristics like, for example, gender, age, study level, ideologic bias, cultural adherence and/or pertinence, and region of socialization.

The research fields that are more commonly used as a source for tools and methods in linguistic criminal profiling are dialect geography, lexicography, sociolinguistics, historical linguistics, psycholinguistics, computational linguistics, and applied linguistics.

The power of linguistic analysis is that the texts, both written or oral, are being studied from several points of view at the same time. What is done, usually, is to analyze a text trying to point out elements that may determine a positive (or negative) identification of the speaker/writer using phonetics, phonology, morphology, syntax/grammar, semantics, and pragmatics which represent the six language domains.

Of course, as no linguist can be an expert in all these domains and tools, here is where the help of a mathematical and statistical approach, carried out with the help of Artificial Intelligence could easily fill any gap in sectorial knowledge.

Are you interested in learning more? Do you want to ask me how to write algorithms for your linguistic (and non-linguistic) AI needs?

Feel free to ask!


One thought on “Applying linguistics to real-world cases – profiling

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s