Analyzing the vaccination debate in social media data Pre- and Post-COVID-19 pandemic

The COVID-19 virus has caused and continues to cause unprecedented impacts on the life trajectories of millions of people globally. Recently, to combat the transmission of the virus, vaccination campaigns around the world have become prevalent. However, while many see such campaigns as positive (e.g., protecting lives), others see them as negative (e.g., the side effects that are not fully understood scientifically), resulting in diverse sentiments towards vaccination campaigns. In addition, the diverse sentiments have seldom been systematically quantified let alone their dynamic changes over space and time. To shed light on this issue, we propose an approach to analyze vaccine sentiments in space and time by using supervised machine learning combined with word embedding techniques. Taking the United States as a test case, we utilize a Twitter dataset (approximately 11.7 million tweets) from January 2015 to July 2021 and measure and map vaccine sentiments (Pro-vaccine, Anti-vaccine, and Neutral) across the nation. In doing so, we can capture the heterogeneous public opinions within social media discussions regarding vaccination among states. Results show how positive sentiment in social media has a strong correlation with the actual vaccinated population. Furthermore, we introduce a simple ratio between Anti and Pro-vaccine as a proxy to quantify vaccine hesitancy and show how our results align with other traditional survey approaches. The proposed approach illustrates the potential to monitor the dynamics of vaccine opinion distribution online, which we hope, can be helpful to explain vaccination rates for the ongoing COVID-19 pandemic.

Qingqing Chen
Qingqing Chen
PhD Candidate | Data Scientist

Ph.D. candidate focuses on critically understanding urban space by leveraging big data, combined with data science and machine learning techniques.