A model to detect Emotionally Manipulative Language
Abstract
The widespread use of online communication has reshaped interpersonal interactions, introducing vast opportunities but also substantial risks. Scholars have extensively examined many threatening online behaviors, mostly focusing on language, producing work aimed at identifying dangerous individuals, violence and harassment, cyberbullying, hate and toxicity, child sexual exploitation, and criminal organizations. This extensive list primarily consists of direct threats, i.e., content including explicit evidence of danger (for example, incitement to violence).
By contrast, in the digital ecosystem, many risks are subtle and veiled behind indirect attempts to induce behaviors and shift opinions. Many existing studies tend to underestimate this category of risks when proposing taxonomies of harmful behaviors, merging into broader and oversimplified classifications of misconduct. Such implicit and ambiguous content is often cataloged as general category of potential emotional or physical risk, without being specifically addressed.
Consequently, the identification of indirect online threats has often been overlooked in research. This lack of attention can be attributed to several factors, including (i) the absence of robust computational models for their detection, (ii) the inherent ambiguity and subtlety of the linguistic cues involved, (iii) and the methodological challenges associated with operationalizing and measuring nuanced threatening behaviors from digital traces. Such oversight carries the risk of prioritizing the prevention of explicit and clearly identifiable forms of harm, potentially resulting in the underestimation of subtle online threats, hence making their detection and counteraction harder.
Among these deceptive and misleading online practices there is the use of persuasive communication for malicious purposes. Persuasion in online environments manifests in diverse forms, ranging from propaganda to interpersonal exchanges. Persuasion is also employed for “emotional appeal”, i.e. to override people’s capacity for rational thought with the use of argumentative fallacies that foster flawed reasoning. In this case, persuasive communication becomes problematic and worrisome as it has the potential to manipulate users and generate harmful consequences. This is especially concerning for young people and elders who are digitally illiterate and often become victims of veiled yet dangerous behaviors, like predatory grooming and fraud, suffering long-term emotional and financial harm.
In this context, the automated detection of Emotionally Manipulative Language (EML) in online conversations becomes essential for intervention, with the broader scope of effective on- line threats mitigation and prevention, aimed at protecting vulnerable users. Existing works to ensure online safe spaces provide a valuable starting point but fail to address the problem in a comprehensive way since, at the present moment, no specific model exists to identify manipulative language.
In this work we develop a language model to automatically detect EML in conversations, as well as the following linguistic strategies used in EML: emotional minimization, power appeals, guilt-tripping, and shame elicitation. We achieve this through the following contributions:
We first propose a formal non-ambiguous definition of EML, grounded in foundational psycholinguistic theories. Building on this, and on and previous research on persuasive communication, we develop a comprehensive tailored codebook that serves as a guideline for both manual annotation and the subsequent automatic detection of EML.
Based on state-of-the-art models, we adopt a human-centric framework for developing an automated AI–human annotation system.
We design and implement a pipeline to automatically collect, transcribe, and diarize a dataset of dialogues in audio format, extracted from TV show episodes openly available on YouTube.
We human-annotate a random subset of dialogue turns, including both binary annotations (presence of EML) and multilabel annotations to capture the four distinct EML strategies.
Based on such annotations, we build and validate a model for automatic EML detection in conversations. Specifically, we employ a subset of human-annotated and expert-verified conversations to train a task-specific model for EML detection within a pseudolabeling setting.
Finally, we deploy a Llama-based language model classifier, fine-tuned using LoRA, capable of both identifying the presence of EML, and distinguishing among the different manipulation strategies.
Presentation in peer-reviewed conferences
This work has been presented at IC2S2 (International Society for Computational Social Science) in July 2025, Norrköping (Sweden).