pexels-photo-4101137.jpeg

In public life, words do more than describe reality. They shape what people notice, what they fear, what they hope for, and what they think is possible. That is why rhetoric audits matter. A speech, a press release, or a campaign ad may sound smooth and polished, yet still rely on framing tricks, emotional nudges, or selective omission. As the statistician and sociologist William A. Gamson once argued, “A problem is not a problem unless it is made one.” The reverse is also true: a message can make a problem look smaller, cleaner, or more urgent than it is.

Today, artificial intelligence gives us a new set of tools for studying that process at scale. Natural language processing can scan millions of words in political speeches, earnings calls, activist statements, and corporate PR. Large language models can go further, helping us trace themes, identify persuasive patterns, and compare how different actors speak about the same event. The goal is not to police opinion. The goal is to make persuasion visible, measurable, and open to audit.

As a professor of statistics, I find this shift especially exciting. For years, rhetoric analysis was often slow, manual, and limited to small samples. That is no longer necessary. With the right data, a sound design, and careful interpretation, we can build a quantitative rhetoric audit that is both rigorous and useful. In what follows, I will outline a practical workflow for doing exactly that.

Why Rhetoric Audits Matter in Public Discourse

Rhetoric audits matter because language is not neutral. It can soften blame, sharpen fear, widen trust, or narrow debate. In politics, a phrase like “enhanced interrogation” can mask coercion. In business, “restructuring” may hide layoffs. In public health, “individual responsibility” can shift attention away from structural causes. These are not merely word choices. They are choices about framing, and framing shapes judgment.

One reason this matters now is scale. Public discourse is no longer limited to a few speeches or editorials. It lives in livestreams, newsletters, social media posts, press statements, podcasts, and comments that spread at high speed. A human reader can only review so much. But a machine can review thousands of texts and flag patterns that deserve attention. This is where rhetoric auditing becomes a statistical problem as much as a literary one.

There is also a democratic reason. Citizens often hear a message before they have time to test it. An automated audit can help ask basic but important questions: Who is speaking? What emotions are being activated? What terms recur when the speaker talks about allies versus enemies? Are facts presented plainly, or wrapped in abstractions? These questions do not replace judgment, but they support it. They give analysts a sharper lens.

From Speech to Data: Building the Audit Corpus

Every audit begins with a corpus. That is the dataset of texts you want to study. For a political rhetoric audit, the corpus might include presidential speeches, campaign rallies, party platform documents, and official statements. For corporate rhetoric, it might include annual reports, earnings-call transcripts, investor letters, and crisis communications. The key is to define the unit of analysis clearly. Are you studying whole documents, paragraphs, or sentences? The answer should match your question.

Data collection should be systematic. If you only sample texts that are easy to find, you may introduce bias before the analysis begins. A sound corpus should aim for coverage across time, speaker, and context. For instance, if you are studying climate rhetoric, you might compare statements from oil companies before and after major policy shifts. If you are studying election rhetoric, you might sample speeches from different candidates across the same campaign window. Good auditing starts with good sampling.

Cleaning the text matters too. Transcripts often contain timestamps, applause markers, speaker labels, and stage directions. These are useful metadata, but they can confuse later models if left untouched. A practical workflow usually strips away noise, standardizes punctuation, and stores metadata in a table. In statistical terms, this is not busywork. It is part of the measurement process. As the old rule goes: garbage in, garbage out.

NLP Methods for Measuring Tone, Framing, and Bias

Once the corpus is ready, NLP methods can turn language into features. Sentiment analysis is the simplest starting point. It estimates whether text is positive, negative, or neutral. This can be useful in crisis communication, campaign messaging, or CEO letters. But sentiment alone is not enough. A speech can be positive in tone while still being manipulative, evasive, or aggressive in structure.

That is why framing analysis is often more revealing. Topic modeling can show which themes dominate a corpus, while keyword comparison can show how those themes are expressed. For example, one speaker may talk about “security” and “threat,” while another emphasizes “care” and “community.” Both may discuss the same policy, but the frame differs. In rhetoric, frame is often the message. As George Lakoff has written, “frames are mental structures that shape the way we see the world.” That insight maps naturally onto NLP.

Bias measurement can also be approached statistically. You can compare the frequency of loaded terms, the use of passive voice, the balance of agency words, or the sentiment around named groups. Suppose a corpus repeatedly uses active verbs for one group and passive constructions for another. That pattern may suggest asymmetric blame. Similarly, if a company uses abstract nouns like “realignment” and “optimization” when describing job cuts, the language may be designed to dull the emotional edge. These are testable patterns, not just hunches.

Using LLMs to Trace Persuasive Patterns at Scale

Large language models add a more flexible layer to rhetoric audits. They can summarize long texts, classify rhetorical moves, and explain why a passage sounds persuasive. Used carefully, they can help analysts move beyond simple counts toward richer interpretation. For example, an LLM can tag sentences as appeal to authority, fear appeal, moral framing, victimization, or reassurance. It can then do this across thousands of documents in a consistent way.

A useful strategy is to pair LLMs with a rubric. Do not ask the model to “analyze rhetoric” in the abstract. Instead, give it a checklist. Ask it to identify claims, emotional language, hedging, absolutes, scapegoating, calls to action, and evidence markers. With a clear prompt, the model becomes a coding assistant rather than a black box oracle. This makes the audit more transparent and more repeatable.

LLMs are also good at scale-assisted comparison. Suppose you want to compare how a company speaks after a data breach versus after a product launch. You can prompt the model to extract recurring phrases, classify tone, and identify shifts in blame or responsibility. Because the model sees many documents in a similar format, it can help surface patterns a human might miss. Still, the final interpretation should remain human-led. As the statistician George Box said, “All models are wrong, but some are useful.” That includes language models.

A Simple Python Workflow for Rhetoric Analysis

A basic rhetoric audit can be built in Python with a fairly small stack. You might use pandas for data handling, nltk or spaCy for preprocessing, scikit-learn for topic modeling or vectorization, and an API-based LLM for structured coding. Start by loading your texts and metadata into a dataframe. Then clean the text, score sentiment, and extract keywords or themes.

Here is a simple example that scores sentiment and counts some rhetorical markers:

import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer
import re

df = pd.read_csv("corpus.csv")  # columns: doc_id, speaker, date, text

sia = SentimentIntensityAnalyzer()

def clean_text(t):
    t = re.sub(r"s+", " ", t)
    t = re.sub(r"[^A-Za-z0-9s']", " ", t)
    return t.lower().strip()

df["clean_text"] = df["text"].apply(clean_text)
df["sentiment"] = df["clean_text"].apply(lambda x: sia.polarity_scores(x)["compound"])

def count_markers(text, markers):
    return sum(len(re.findall(rf"b{m}b", text)) for m in markers)

fear_words = ["danger", "threat", "crisis", "attack", "urgent"]
certainty_words = ["always", "never", "clearly", "undeniable", "obvious"]

df["fear_count"] = df["clean_text"].apply(lambda x: count_markers(x, fear_words))
df["certainty_count"] = df["clean_text"].apply(lambda x: count_markers(x, certainty_words))

print(df.groupby("speaker")[["sentiment", "fear_count", "certainty_count"]].mean())

This is only a start, but it already gives you a useful table. You can compare speakers, time periods, or document types. You can also plot changes over time. The real value comes when you combine these features with close reading. Numbers point the way; interpretation does the rest.

If you want to use an LLM, a structured prompt can help with consistency:

You are coding rhetoric in a public statement.

For each paragraph, label:
1. Tone: positive, negative, neutral, mixed
2. Framing: responsibility, threat, unity, progress, blame, reassurance, other
3. Persuasive moves: appeal to authority, emotional appeal, repetition, hedging, absolutes, moral language

Return JSON with one record per paragraph and brief justification.

That output can then be analyzed like any other dataset.

Limits, Ethics, and the Future of Auditing Voice

Even the best rhetoric audit has limits. Language is slippery. Sarcasm, irony, and local context can confuse models. A phrase that sounds manipulative in one setting may be ordinary in another. And a model trained on general data may misunderstand domain-specific speech, especially in law, medicine, or politics. That is why audits should be treated as aids to judgment, not final verdicts.

There are also ethical concerns. If rhetoric audits are used carelessly, they may become tools of surveillance or censorship. That would be a grave mistake. The purpose is not to punish speech but to understand it. Audits should be transparent about their methods, their sample, and their uncertainty. They should also protect privacy when working with sensitive texts. In this sense, rhetoric auditing must follow the same standards as good statistical practice: clarity, restraint, and honest error reporting.

The future is promising, but it will reward humility. Better models will likely improve extraction, classification, and cross-document comparison. Yet the deeper task will remain human. We must ask not only what a text says, but what it tries to do. That question cannot be answered by code alone. As Hannah Arendt warned, “The ideal subject of totalitarian rule is not the convinced Nazi or the convinced Communist, but people for whom the distinction between fact and fiction… no longer exists.” A rhetoric audit, done well, helps defend that distinction.

Automating the rhetoric audit does not mean replacing readers. It means giving them instruments. It means turning a flood of speech into analyzable evidence, while preserving the need for interpretation. In a world where persuasion arrives in endless streams, that is no small gift. The right tools can reveal patterns of framing, tone, and bias that would otherwise remain hidden in plain sight.

For scholars, journalists, policy teams, and civic groups, the practical path is clear. Build a corpus carefully. Use NLP to measure tone and topic. Use LLMs to classify rhetorical moves at scale. Then check the output against close reading and domain knowledge. The result is not just a better analysis of language. It is a better understanding of power, since power often travels on words.

If there is one lesson to keep in mind, it is this: rhetoric is data, but never only data. It is evidence, but also performance. It is measurable, but also moral. A quantitative rhetoric audit can help us see the machinery of persuasion more clearly. And once we can see it, we are in a better position to question it, improve it, and, when needed, resist it.

Leave a Reply

Discover more from Rhetoric Audit Research and Publications

Subscribe now to keep reading and get access to the full archive.

Continue reading