Natural Language Processing

to Categorize Misinformation

A tool to assist researchers in tracking mis/disinformation and providing rapid response

Project Information

In recent years there has been an abundance of mis/disinformation around US elections on social media sites like Twitter. Research collectives such as the Election Integrity Partnership investigated this information in real time to analyze and disseminate important details across election stakeholders. However, not all researchers have the quantitative skill sets necessary to access this data at scale. Our project improves the research process of qualitative researchers by creating a Python Jupyter Notebook that allows researchers to gain insight from datasets relevant to the US election.

Sponsors

This project is sponsored by the University of Washington’s Center for an Informed Public (UW CIP). Mike Caulfield is the main point of contact for this sponsored project, who is a Research Scientist for the UW CIP and is working with the Election Integrity Partnership (EIP). Alongside him, there is Emily Porter, who is a PhD student with an expertise in Information Science working with the UW CIP.

Our Solution

We developed a comprehensive Python Jupyter Notebook that includes features to assist qualitative researchers in their research process. These features are sentiment analysis, search, standardize timezones, popular tweets, and filters (retweets, date, and time).

Sentiment Analysis

Find positive, negative and neutral tweets related to keyword/phrase/hashtag

Search

Find tweets related to a term and/or exclude term(s) using vectorization

Standardize Timezones

Convert timezones in the dataset (e.g., GMT to PST)

Filter (Date & Time)

Filter to find tweets by date and time

Filter (Retweets)

Filter to find if a tweet was retweeted or not

Popular Tweets

Count the number of times a tweet was retweeted

With these tools, you can perform queries like:

  • Check tweets that were posted on January 6th, 2021 during the time of the Capitol insurrection in Georgia’s timezone
  • Look up which tweets similar to “fraud” have been retweeted the most
  • Look at tweets with positive sentiment related to “President Trump”
  • Filter for all tweets mentioning “#stopthesteal” from election week

PRODUCT DEMO

The Team

Zhi smiling

Zhi Ye

Product Manager

Angelina smiling

Angelina Poltavskiy

UX/Front-End Dev.

Siya smiling

Siya Sharma

Data Scientist

Jade smiling

Jade D'Souza

Data Scientist/Back-End Dev.

Project Status

As of May 21st, 2022, the project has been transferred to the UW CIP for further development. For any questions, please contact Mike Caulfield at mica42@uw.edu.