This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation

Type Dataset Memon, Shahan Ali, Carley, Kathleen M. (2020): CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation. Zenodo. Dataset. https://zenodo.org/record/4024154

Authors: Memon, Shahan Ali (Carnegie Mellon University) ; Carley, Kathleen M. (Carnegie Mellon University) ;

Links

Summary

From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. Detection and characterization of misinformation requires an availability of annotated datasets. Most of the published COVID-19 Twitter datasets are generic, lack annotations or labels, employ automated annotations using transfer learning or semi-supervised methods, or are not specifically designed for misinformation. Annotated datasets are either only focused on "fake news", are small in size, or have less diversity in terms of classes.

Here, we present a novel Twitter misinformation dataset called "CMU-MisCov19" with 4573 annotated tweets over 17 themes around the COVID-19 discourse. We also present our annotation codebook for the different COVID-19 themes on Twitter, along with their descriptions and examples, for the community to use for collecting further annotations. Further details related to the dataset, and our analysis based on this dataset can be found at https://arxiv.org/abs/2008.00791. In adherence to the Twitter’s terms and conditions, we do not provide the full tweet JSONs but provide a ".csv" file with the tweet IDs so that the tweets can be rehydrated. We also provide the annotations, and the date of creation for each tweet for the reproduction of the results of our analyses.

Note: If for any reason, you are not able to rehydrate all the tweets, reach out to Shahan Ali Memon at (shahan@nyu.edu).

If you use this data, please cite our paper as follows: 

"Shahan Ali Memon and Kathleen M. Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset, In Proceedings of The 5th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2020), co-located with CIKM, virtual event due to COVID-19, 2020."

More information

  • DOI: 10.5281/zenodo.4024154

Subjects

  • covid, coronavirus, misinformation, twitter, covid-19, network analysis, sociolinguistics, dataset

Dates

  • Publication date: 2020
  • Issued: September 19, 2020

Notes

Other: If you use this dataset, please cite our recently accepted paper on "Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset" at MAISON Workshop at CIKM 2020 as follows: "Shahan Ali Memon and Kathleen M. Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset, In Proceedings of The 5th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2020), co-located with CIKM, virtual event due to COVID-19, 2020." The preprint version of the paper can found at https://arxiv.org/abs/2008.00791.

Rights


Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsSupplementToarXiv:2008.00791
IsVersionOfhttps://doi.org/10.5281/zenodo.4024153
IsPartOfhttps://zenodo.org/communities/covid-19
IsPartOfhttps://zenodo.org/communities/linguistics
IsPartOfhttps://zenodo.org/communities/natural-language-processing
IsPartOfhttps://zenodo.org/communities/twitter-datasets
IsPartOfhttps://zenodo.org/communities/zenodo