Joint Topic and Perspective Model
As part of my PhD thesis, I have been working on developing statistical models for ideological text and video. By ideology I mean “a set of beliefs commonly shared by a group of people.” For example, “pro-life” and “pro-choice” are two main ideologies on the abortion issue.
Automatically analyzing ideological text has been considered almost impossible. Abelson, who is a pioneer in computer modeling of ideological beliefs and first develops a computer simulation of the conservative beliefs of Barry Goldwater, expressed a very pessimistic view on automatically analyzing ideological text in the sixties of the twentieth century:
The simulation of the belief systems of other individuals with very different views is also being contemplated, but this step cannot be undertaken lightly since the paraphrasing procedure is extremely difficult. One might suppose that fully automatic content analysis methods could be applied to the writings and speeches of public figures, but there is an annoying technical problem which renders this possibility a vain hope.
I deeply admire Abelson’s vision of computer modeling of ideological beliefs. Such a computer system will enable news aggregation web sites such as Google News to better organize news and blogs by ideological perspectives than simply presenting a huge cluster of news stories. Such a computer system can also identify highly biased news articles and raise the awareness of individual newspapers and television broadcasters’ biases.
However, I do not subscribe his view on automatic analysis of ideological text. I have observed an unique emphatic patterns of word choices in many ideological texts, and develop a statistical model that simultaneously capture two factors, topical and ideological, that contribute to words choices made by authors holding contrasting ideological beliefs.
- Topical: Ideology is situated in a specific topic. “Pro-life” ideology is relevant mostly in news articles about pregnancy but less relevant in articles about baseball. Some words will be chosen because they are about the topic.
- Ideological: Authors or speakers holding different ideological beliefs emphasize some words (write or speak more) and de-emphasize (write or speak less) the other words when they express ideological views on an issue.
I thus call this statistical model for ideological discourse a Joint Topic and Perspective Model (jTP).
Here is an example of fitting jTP on the editorials about the Israeli-Palestinian conflict published on the bitterlemons.org. I summarize the topical and ideological factors uncovered by jTP in a color text cloud. A word’s size is proportional to its topical factor (i.e., how much its occurrence is attributed to the topic), and a word’s color depth is proportional to its ideological factor (i.e., how much its occurrence is attributed to the ideological perspective of an author). The “neural” words that are particularly emphasized by either size are painted light gray. Words chosen more often by the Israeli authors are painted red, and words used more often by the Palestinian authors are painted blue.
Topical words (in large size) such as “Palestinian” and “Israeli” are not surprisingly chosen very often by both sides. These topical words, however, are not particularly emphasized by either side. The Israeli and Palestinian perspectives are clearly reflected in their word choices. The Israeli authors choose more “terrorism”, while the Palestinian authors choose more “occupation” and “resistance.” Interestingly, “Arafat”, a former Palestinian leader, is mentioned more often by the Israeli authors than the Palestinian authors.
Here is another example of fitting jTP on the speech transcripts of the 2000 and 2004 United States presidential debates. Words emphasized by the Democratic presidential candidates are pained red, and words emphasized by the Republican presidential candidates are painted blue.
The Democratic presidential candidates choose more “families” and “kids”, while the Republican presidential candidates choose more “freedom” and “Washington.”
These examples show that ideological beliefs are very much reflected in word choices made by an author or a speaker. By modeling these statistical patterns, computers can “learn” how ideological perspectives are reflected in word choices from a large collection of documents.
You can find more details of the Joint Topic and Perspective Model in our paper accepted in the coming 2008 European Conference on Machine Learning.