What’s the Scoop, Wei-Hao?

Archive for the ‘Uncategorized’ Category


leave a comment »

One of the application of automatic ideology analysis in my PhD thesis work is to predict the ideological perspective from which an article is written.  I make a web-based demo, Ideology-O-Meter, that takes a input text on the Israeli-Palestinian conflict and analyze how likely the text is written from the Israeli or the Palestinian perspective.



There are three panels in the Ideology-O-Meter demo.  In the left panel, you can type any text you would like to identify its ideological perspective on the Israeli-Palestinian conflict.  I prepare example texts written by real Israeli and Palestinian authors on the bitterlemons.org.  You can use these examples by clicking one of the two buttons above the text box.  After you press the Identify button at the bottom, the input text will be sent to the automatic ideology analysis program running in the background.  The program will parse the text, and infer the likelihood of expressing ideological beliefs using the Joint Topic and Perspective Model.

The results are shown in the middle and right panels.  The middle panel is the Ideology-O-Meter, and the position of arrow indicates how strongly the input text conveys one of the two ideological perspectives.  The more extremely a text expresses the Israeli view, the more the arrow moves to the right.  Similarly, the more extremely a text expresses the Palestinian view, the more the arrow moves to the left.  In the above example, the text appears to be written very much from the Palestinian perspective.

The third panel lists the top 10 more frequent words in the input text and their frequencies (in dark yellow).  The longer the bar, the more frequently the word appears in the input text.  The light yellow bar is the expected frequency in articles written from the Palestinian perspective that the Joint Topic and Perspective Model learns from the bitterlemons corpus.  The closer the two bars in proportion, the more likely the input text is written from the Palestinian perspective.

You can read more about the statisitcal model behind the scene in our coming ECML paper.


Written by Max

July 25, 2008 at 1:53 pm

Posted in Uncategorized

Joint Topic and Perspective Model

with one comment

As part of my PhD thesis, I have been working on developing statistical models for ideological text and video. By ideology I mean “a set of beliefs commonly shared by a group of people.” For example, “pro-life” and “pro-choice” are two main ideologies on the abortion issue.

Automatically analyzing ideological text has been considered almost impossible. Abelson, who is a pioneer in computer modeling of ideological beliefs and first develops a computer simulation of the conservative beliefs of Barry Goldwater, expressed a very pessimistic view on automatically analyzing ideological text in the sixties of the twentieth century:

The simulation of the belief systems of other individuals with very different views is also being contemplated, but this step cannot be undertaken lightly since the paraphrasing procedure is extremely difficult. One might suppose that fully automatic content analysis methods could be applied to the writings and speeches of public figures, but there is an annoying technical problem which renders this possibility a vain hope.

I deeply admire Abelson’s vision of computer modeling of ideological beliefs. Such a computer system will enable news aggregation web sites such as Google News to better organize news and blogs by ideological perspectives than simply presenting a huge cluster of news stories. Such a computer system can also identify highly biased news articles and raise the awareness of individual newspapers and television broadcasters’ biases.

However, I do not subscribe his view on automatic analysis of ideological text. I have observed an unique emphatic patterns of word choices in many ideological texts, and develop a statistical model that simultaneously capture two factors, topical and ideological, that contribute to words choices made by authors holding contrasting ideological beliefs.

  • Topical: Ideology is situated in a specific topic. “Pro-life” ideology is relevant mostly in news articles about pregnancy but less relevant in articles about baseball. Some words will be chosen because they are about the topic.
  • Ideological: Authors or speakers holding different ideological beliefs emphasize some words (write or speak more) and de-emphasize (write or speak less) the other words when they express ideological views on an issue.

I thus call this statistical model for ideological discourse a Joint Topic and Perspective Model (jTP).

Here is an example of fitting jTP on the editorials about the Israeli-Palestinian conflict published on the bitterlemons.org. I summarize the topical and ideological factors uncovered by jTP in a color text cloud. A word’s size is proportional to its topical factor (i.e., how much its occurrence is attributed to the topic), and a word’s color depth is proportional to its ideological factor (i.e., how much its occurrence is attributed to the ideological perspective of an author). The “neural” words that are particularly emphasized by either size are painted light gray. Words chosen more often by the Israeli authors are painted red, and words used more often by the Palestinian authors are painted blue.

Topical words (in large size) such as “Palestinian” and “Israeli” are not surprisingly chosen very often by both sides. These topical words, however, are not particularly emphasized by either side. The Israeli and Palestinian perspectives are clearly reflected in their word choices. The Israeli authors choose more “terrorism”, while the Palestinian authors choose more “occupation” and “resistance.” Interestingly, “Arafat”, a former Palestinian leader, is mentioned more often by the Israeli authors than the Palestinian authors.

Here is another example of fitting jTP on the speech transcripts of the 2000 and 2004 United States presidential debates. Words emphasized by the Democratic presidential candidates are pained red, and words emphasized by the Republican presidential candidates are painted blue.

The Democratic presidential candidates choose more “families” and “kids”, while the Republican presidential candidates choose more “freedom” and “Washington.”

These examples show that ideological beliefs are very much reflected in word choices made by an author or a speaker. By modeling these statistical patterns, computers can “learn” how ideological perspectives are reflected in word choices from a large collection of documents.

You can find more details of the Joint Topic and Perspective Model in our paper accepted in the coming 2008 European Conference on Machine Learning.

Written by Max

June 26, 2008 at 6:40 pm

Posted in Uncategorized

Hello, World

leave a comment »

One of my favorite constants is \pi. I still don’t know how to tell my mom the \pi in the normal distribution actually has something to do with a circle.

Written by Max

June 15, 2008 at 5:17 pm

Posted in Uncategorized