Tracking viruses on Twitter

Marcel Salathe, a biologist and computer scientist.
Marcel Salathe, a biologist and computer scientist.

A study shows how social media can be used to find at-risk areas.

Posted: November 23, 2011

As a biologist and computer scientist, Pennsylvania State University's Marcel Salathe studies the viral spread of information and the spread of real viruses.

Now he has found a link between the two: When the viral idea helps create resistance to vaccines, it leaves a path for real viruses to follow.

Using Twitter, he identified regional clusters where people were likely to forgo immunizations. Those could be hot spots of potential outbreaks.

The results, published last month, show how social media can be harnessed to identify at-risk areas and to help focus public health messages. "I definitely think it's interesting and a quick way to get relevant information," said Paul Offit, who directs the Vaccine Education Center at the Children's Hospital of Philadelphia.

Misinformation about vaccines is continuing to allow outbreaks of vaccine-preventable diseases, he said. In the last few years there have been outbreaks of measles, mumps, bacterial meningitis, and whooping cough. A few children died or were left with deafness or other permanent disabilities.

While the fear of autism associated with vaccines has diminished, he said, a more general distrust remains.

Salathe says he understands the unease people feel about a doctor sticking a needle into their healthy 2-year-old child. That's why it's important to help people get good information.

In 2009, two events occurred that gave him the perfect opportunity to study the contagion of germs and ideas. The H1N1 "Swine Flu" broke out and the rapid-fire social-networking site Twitter started to take off. Twitter, he said, "is a relatively inexpensive way you can look at hundreds of thousands of people in real time."

When the deadly flu first broke out in the spring of that year, there was no vaccine, but one was developed by the fall, when H1N1 went through a second wave. Some members of the public welcomed the vaccine, while others were highly suspicious.

During the intervening months, Salathe and colleague Shashank Khandelwal collected half a million tweets with words tied to immunization or vaccination. For a subset of 75,000 of those, he asked Penn State students to rate them as positive, negative, neutral, or irrelevant.

An example of a positive tweet was: "Off to get swine flu vaccinated before work," and a negative one was, "What Can You Do To Resist The U.S. H1N1 'vaccination' program? Help Get World Out. The H1N1 'vaccine' is DIRTY.dontgetit."

He estimated how consistently students could rate the tweets by giving 700 of them to all the students and checking if they made similar assessments. Tweets, he noted, come in varying degrees of coherence and are often laden with obscure jargon.

It would have been too cumbersome to analyze a half a million tweets by hand, he said, so the next step was to hand the job to computers by employing a technique known as machine learning.

In machine learning, a computer can be "trained" to duplicate some results of human judgment. Machine learning has been used for everything from detecting credit card fraud to grading the results of essay exams. When Salathe fed the computer some of the tweets the students had already evaluated, it identified patterns that eventually enabled it to match humans about 85 percent of the time. Machine-learning experts told Salathe that was about as accurate as it was likely to get.

When the computer finished analyzing all the tweets, he found that negative attitudes about vaccines clustered in the same geographic regions where people were less likely to get vaccinated. The results were published last month in the journal Plos Computational Biology.

The clustering is worrisome because of the way diseases spread. If unvaccinated people are far apart, diseases are less likely to move between them, since infected individuals are surrounded by those who are immune. But when the unvaccinated are concentrated in the same geographic region, a disease can become an outbreak.

In the two years since the study began, he said, Twitter has itself spread like a virus, and so it would be easier and faster to mine huge numbers of public health-related tweets today. And Twitter gives information not just on the users and where they live but on how they are connected to one another.

Other health researchers are looking into social media. A group at the University of Pennsylvania led by Raina Merchant recently announced it is beginning to use Twitter to gather information about heart disease and cardiac arrest.

Salathe said he found it ironic that people used the term viral to describe the way very catchy ideas spread, since most viruses tend to peter out. When a real virus goes viral, that is an epidemic or a pandemic, a rare event, he said. If they can dispel anti-vaccine rumors, it is more likely to stay that way.

Contact staff writer Faye Flam

at 215-854-4977 or