This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: Reverse geo-tagging included; duplicates removed

Type Software George Fisher (2014): Reverse geo-tagging included; duplicates removed. Zenodo. Software. https://zenodo.org/record/11661

Author: George Fisher (George Fisher Advisors LLC) ;

Links

Summary

All of the tweets for this project have been processed and consolidated into a single file that can be downloaded with this link:

https://s3-us-west-2.amazonaws.com/healthcare-twitter-analysis/HTA_noduplicates.gz 1.85 Gb zipped / 15.80 Gb unzipped

Each of the 4 million rows in this file is a tweet in json format containing the following information:

All the Twitter data in exactly the json format of the original Unix time stamp All the Topsy data originating file name score author screen name URLs

60% of the records have geographic information ...

Latitude & Longitude Country name & ISO2 country code City For country code "US" Zipcode Telephone area code Square miles inside the zipcode 2010 Census population of the zipcode County & FIPS code State name & USPS abbreviation

The basic technique for using this file in Python is the following:

import json with open("HTA_noduplicates.json", "r") as f: # convert each row in turn into json format and process for row in f: tweet = json.loads(row) text = tweet["text"] # text of original tweet ... # etc.

Python provides very powerful analytical and plotting features but R is also very handy; R does not work well with large datasets but Python can be used to create a targeted subset file that R can read (or Excel, or anything else for that matter).

For long-running jobs, I used Amazon Web Service's EC2 running Ubuntu 14.04, accessed via PuTTY and WebSCP; for local processing I used a Windows 7 laptop with the data on a terabyte external hard drive.

The Status Report in the main repo contains

a comprehensive explanation of the dataset examples of analyses done with this dataset a list of references to other healthcare-related Twitter analyses instructions for using Amazon Web Services sample programs using this file with Python, R and MongoDB.

More information

  • DOI: 10.5281/zenodo.11661

Dates

  • Publication date: 2014
  • Issued: September 10, 2014

Rights

  • info:eu-repo/semantics/openAccess Open Access

Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsSupplementTohttps://github.com/grfiv/healthcare_twitter_analysis/tree/V1.0
IsVersionOfhttps://doi.org/10.5281/zenodo.592043
IsPartOfhttps://zenodo.org/communities/zenodo