CMU-MisCov19: A Novel Twi...

Title: CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation

Type Dataset Memon, Shahan Ali, Carley, Kathleen M. (2020): CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation. Zenodo. Dataset. https://zenodo.org/record/4024154

Authors: Memon, Shahan Ali (Carnegie Mellon University) ; Carley, Kathleen M. (Carnegie Mellon University) ;

Summary

From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. Detection and characterization of misinformation requires an availability of annotated datasets. Most of the published COVID-19 Twitter datasets are generic, lack annotations or labels, employ automated annotations using transfer learning or semi-supervised methods, or are not specifically designed for misinformation. Annotated datasets are either only focused on "fake news", are small in size, or have less diversity in terms of classes.

Here, we present a novel Twitter misinformation dataset called "CMU-MisCov19" with 4573 annotated tweets over 17 themes around the COVID-19 discourse. We also present our annotation codebook for the different COVID-19 themes on Twitter, along with their descriptions and examples, for the community to use for collecting further annotations. Further details related to the dataset, and our analysis based on this dataset can be found at https://arxiv.org/abs/2008.00791. In adherence to the Twitter’s terms and conditions, we do not provide the full tweet JSONs but provide a ".csv" file with the tweet IDs so that the tweets can be rehydrated. We also provide the annotations, and the date of creation for each tweet for the reproduction of the results of our analyses.

Note: If for any reason, you are not able to rehydrate all the tweets, reach out to Shahan Ali Memon at (shahan@nyu.edu).

If you use this data, please cite our paper as follows:

"Shahan Ali Memon and Kathleen M. Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset, In Proceedings of The 5th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2020), co-located with CIKM, virtual event due to COVID-19, 2020."

More information

DOI: 10.5281/zenodo.4024154

Subjects

covid, coronavirus, misinformation, twitter, covid-19, network analysis, sociolinguistics, dataset

Dates

Publication date: 2020
Issued: September 19, 2020

Notes

Other: If you use this dataset, please cite our recently accepted paper on "Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset" at MAISON Workshop at CIKM 2020 as follows: "Shahan Ali Memon and Kathleen M. Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset, In Proceedings of The 5th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2020), co-located with CIKM, virtual event due to COVID-19, 2020." The preprint version of the paper can found at https://arxiv.org/abs/2008.00791.

Rights

https://creativecommons.org/licenses/by/4.0/legalcode Creative Commons Attribution 4.0 International
info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	arXiv:2008.00791
		IsVersionOf	https://doi.org/10.5281/zenodo.4024153
		IsPartOf	https://zenodo.org/communities/covid-19
		IsPartOf	https://zenodo.org/communities/linguistics
		IsPartOf	https://zenodo.org/communities/natural-language-processing
		IsPartOf	https://zenodo.org/communities/twitter-datasets
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries Homepage

Title: CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation

Links

Summary

More information

Subjects

Dates

Notes

Rights

Format

Relateditems