MITH Bags


01A4477C-5C6E-41CA-A287-24E5B719EA6D

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-03-24
External-Description
This bag contains Twitter data for 108,884 tweets that were sent to or from William Gibson (@greatdismal). However it is believed that the dataset does does not include retweets. At the time of data collection the greatdismal account had tweeted and retweeted 46,350 tweets, but this collection only includes 19,743 sent by greatdismal. The inference here is that the difference are retweets, or missed tweets. The data was retrieved using a program that collected tweet identifiers from Twitter's search user interface for the query: from:greatdismal OR to:greatdismal A sorted version of this list of identifiers is available in the greatdismal-ids.txt file. The full tweet JSON data for the tweets was then retrieved from Twitter's API using twarc (http://twitter.com/edsu/twarc). This data can be found in greatdismal.json file.
Size
275.2MB
License
UMD only

04C60F25-0683-4105-99F1-E432E4E1A1A8

Bagging-Date
2015-06-16
External-Description
This bag contains multiple artifacts for the Occupied Japan Gender, Class and Race project website created in 2003-2004 by Marlene Mayo, who was a Freeman Foundation Fellow. The website was found at http://mith.umd.edu/gcr and was password protected do to conerns about copyright. The bag was created on June 15, 2015 after an earlier migration from zelda.umd.edu at UMD to Amazon Web Services failed to migrate the website. At that point a snapshot of the website (PHP code, MySQL database and static assets) was created, which you can find as gcr.tar.gz. At the same time the website was crawled with wget (the command is in get.sh) to both mirror the contents of the website and also create a WARC and CDX web archive. The static mirror of the website can be found in the mith.umd.edu directory. The static site was then put back online at http://mith.umd.edu/gcr on a machine hosted on Amazon EC2.
Size
562.0MB

061F44EC-3CFD-4403-8199-23F3124B9FF9

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2020-03-11
External-Description
This bag contains tweets collected by Ed Summers in collaboration with Vernon Mitchell, Cassie Adcock, and AJ Robinson (WUSTL). They document the resistance to efforts by the Hindu Right in India to deprive Muslim Indians from citizenship. Data collection from the Twitter filter stream began on December 25, 2019 and ended on January 21 after collecting 14,144,417 tweets. In addition on Decemebr 25, 2019 data was collected from the Twitter search API collecting 5,345,453 tweets back to December 20, 2019. In total 19,489,870 were collected and their tweet ids are are stored in sorted order in the ids.txt.gz file. In addition the twarc commands that were used to run the data collection can be found in the search.sh and stream.sh scripts in the data directory. Below is the email from Cassie sent on December 24, 2019 which outlined the rationale for what was collected: From: cadcock@wustl.edu Subject: Re: Archiving Twitter RE: India unrest Date: Tue, 24 Dec 2019 06:53:45 -0800 (PST) Thank you, Ed and Vernon. I'm also including AJ Robinson on this message, since she expressed interest in this archiving project. I'm primarily thinking of creating a dataset for future researchers. This is a major political event, in which the Hindu Right, which has been steadily rising over recent decades, finally confronts massive popular resistance in its effort to impose its vision of a Hindu Nation by depriving Muslim Indians of citizenship. The resistance is India-wide, and includes a wide base of social groups. This is also a major turning point, because we finally see the Hindu Nationalist government in power directing police forces to direct unwarranted brutality and violence on Muslim Indians all across the country. On both counts, we have seen this before, but localized in particular cities or states, not India-wide. I have zero tech skills of this kind so I would be grateful to see some preliminary archive underway. The core tags seem to be: #CAA #CAAProtests #CAB #CABProtests #NRC #JamiaMilia #JamiaProtest And variations – #NoToCAB #NoToNRC #CitizenshipAmendmentAct #CAA_NRC_Protests #IndiaAgainstCAA_NRC I'm less concerned to archive the propaganda-mongering of the Hindu Right, but those tags are: #isupportCAB #IsupportCAB2019 #ISupportCAA_NRC #ISupportCAA
Size
17.3GB

06F5C818-8CFD-477F-A0C0-7EA6EEE0BF76

Contact-Name
Damien Pfister
Contact-Email
dsp@umd.edu
Bagging-Date
2020-08-16
External-Description
This bag contains the Omeka server side code and database snapshot for the Internet Research Agency Ads website that was created by Damien Pfister in collaboration with Ed Summers and Purdom Lindblad of MITH. The bag also contains a wget crawl of https://mith.umd.edu/irads/ which is persisted as a static site and a WARC file. File listing: irads.sql.gz - Omeka MySQL database dump mith-irads.tar.gz - Omeka server side PHP code irads.warc.gz - WARC file generated by wget crawl mith.umd.edu.tar.gz - static site created with wget crawl The site description at the time of archiving was as follows: This site explores--and offers users the opportunity to explore--the rhetoric of computational propaganda that occurred on Facebook during the 2016 election. The project was developed by Dr. Damien Smith Pfister, Nora Murphy, Meridith Styer, and Misti Yang in collaboration with Purdom Lindblad and Ed Summers at the Maryland Institute for Technology in the Humanities. 160 students from the "Interpreting Strategic Discourse" classes offered by the University of Maryland's Department of Communication coded the dataset by hand. The IRAdS website contains over 3,000 Facebook advertisements that the Internet Research Agency, a Russia-linked “troll farm,” purchased in the run-up to the 2016 election campaign. This is one of the most sophisticated efforts at computational propaganda yet, but little systematic analysis has been done on this data corpus. Pfister, Murphy, and Yang developed the codebook based on concepts developed in the class (e.g. metaphors, myths, ideographs, semiotics; syllabus available here). Our hope is that this dataset will surface and organize different themes across these advertisements. In collaboration with MITH, these advertisements will be posted, with our analysis embedded as metadata, on a website that other publics can use to better understand Russia’s propaganda efforts.
Size
5.4GB

085323E0-E95A-4045-A1DF-27B5F65C1EE6

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-06-30
External-Description
On June 17, 2015 a mass shooting took place at Emanuel African Methodist Episcopal Church in Charleston, South Carolina. Soon afterward the #charleston and #charlestonshooting hashtags were used on Twitter to spread news of the event. On June 18th Bergis Jules of UC Riverside and Ed Summers of University of Maryland began collecting #charleston and #charlestonshooting tweets as both a historical search and a stream. Both files are included here, and comprise 3,099,173 tweets from June 10 to June 30. The first few hundred of the tweets included #charleston tweets prior to the event.
Size
2.1GB

097C9916-8BB3-43A9-BD9F-EF26AF5B150E

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-07-10
External-Description
This bag contains a wget capture of the https://archive.blackgothamarchive.org/ Omeka website on July 10, 2018 in order to decommission the Omeka website that was several versions behind, and hadn't been updated in 4 years but still contains a useful collection of materials. The static version of the website was mounted in place of the Omeka site, and the server side PHP and database were placed into this bag. The wget capture was executed with the bagweb utility https://github.com/edsu/bagweb/ The Omeka instance couldn't be upgraded to the latest version of Omeka because its theme was not compatible. This meant that PHP could not be updated past v5.6, and security patches couldn't be installed. If you want to bring the Omeka site back online you will need to use PHP v5.6. This apt-get install command should pull in the necessary modules: sudo apt-get install php5.6 php5.6-mbstring php5.6-xml php5.6-mysql \ php5.6-common php5.6-gd php5.6-json php5.6-cli php5.6-curl You will also need to enable mod_rewrite in Apache: sudo a2enmod rewrite
Size
925.5MB

0ED6D8FA-829A-43FE-A02F-C9394763641A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-10-14
External-Description
463,956 tweets collected between 2016-01-29 and 2016-10-14 that used the hashtag #SayHerName. During this time #SayHerName was a social movement that raised awareness for black female victims of police brutality and anti-Black violence in the United States. The tweets were collected as part of a collaborative research project between MITH and the Sociology Department at the University of Maryland. The results of analyzing the data were published in this paper: Brown, M., Ray, R., Summers, E. & Fraistat, N. (2017) #SayHerName: a case study of intersectional social media activism. Ethnic and Racial Studies, 40(11), 1831-1846. http://dx.doi.org/10.1080/01419870.2017.1334934 The bag contains four files, one of which was collected from Twitter's Search API and the other three were collected from the Streaming API. The reason for the seaprate streaming API files is that there was network connectivity problems that went unnoticed for some period of time that resulted in two gaps. The first gap occurs on March 19, 2016 and extends to April 22, 2016. The second gap starts on June 26, 2016 and goes till July 6, 2016. The twarc.log file was created by the twarc utility as it was collecting data from the Twitter APIs. On January 2, 2017 the stream4.json.gz file was updated to remove a partial JSON object in the last line of stream4.json.gz which was caused by the forced termination of the stream. This description was also updated with information based on use of the data by Trevor Muñoz for a presentation at MLA 2018.
Size
274.9MB

1049CF0E-6B74-433B-A0F3-D074E960D9ED

Contact-Name
Ed Summers
Contact-Email
edsu@umd.edu
Bagging-Date
2018-10-30
External-Description
The African American History, Culture and Digital Humanities conference was held at the University of Maryland at College Park between October 18-20, 2018. Because of MITH's involvement in the project the tweets containing the hashtags #aadhum2018 and #aadhum18 were collected using the twarc utility on October 22. The search retrieved 7,226 tweets back to October 11. Since there were many tweets in response threads that did not have the hashtag, the twarc replies.py utility was used to collect conversation threads that hung off of the originally collected tags. This generated an additional 738 tweets, which are included along with all the originally collected tweets search-replies.jsonl.gz.
Size
8.3MB
License
UMD Only

10DF7C8C-6A81-49D0-A9A2-3528E9B0D73C

Bagging-Date
2016-03-20
External-Description
wget capture of http://mith.umd.edu/offthetracks/ created by ed on 2016-03-20 the bag includes: - offthetracks.sql.gz: wordpress database dump - mith.umd.edu-offthetracks.tar.gz: wordpress directory snapshot - mith.umd.edu.tar.gz: static site capture from wget - offthetracks.warc.gz: warc file for static site crawl
Size
644.9MB

157CB91A-389D-47B1-81F3-07F524D86E09

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://mithumd.edu/tile/ created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
169.2MB

16088C55-9565-4907-962D-6B2D7AEA02F7

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-20
External-Description
This is a snapshot of the DISC website code and database taken on May 20, 2015 by Porter Olsen. The DISC website was retired because of concerns over security vulnerabilities in the code in 2006. The physical hardware that served the MITH website at that time (minerva.umd.edu) was preserved at that time. Porter Olsen was able to locate the server code and database in 2015.
Size
2.3MB

16948DE6-1A6E-4115-9CC6-2B9859443FAE

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-08-19
External-Description
This is a snapshot of the /export/software directory on zelda.umd.edu on August 18, 2015 when it was finally decommissioned (turned off). Previously zelda was the host that made many MITH web properties available. The websites were largely moved over to Amazon Web Services in December of 2014. But zelda was left on for 8 months while we transitioned some last remaining DNS names that were pointed at zelda. /export/software contains all the database and website content. It is in a tarball, which has not preserved timestamps and usernames unfortunately. For that there is going to be a forensic snapshot of zelda which should be made available as a separate bag. This bag is meant to be a convenience to get data that is needed after zelda is taken offline, without requiring remounting disk images.
Size
201.2GB

185A16A4-82D9-4810-8568-B52D83BBAAD6

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-08-15
External-Description
This bag contains 1,304,702 tweets that contain the word "ferguson" between July 30, 2015 and August 11, 2015. This includes the lead up to 1st anniversary of Michael Brown's killing on August 8th, 2014.
Size
835.2MB
License
UMD Only

1A440285-EF8E-430D-81D0-B9B591A7BF90

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://www.blackgothamarchive.org created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
24.2MB

1BBB9316-15B9-402E-A518-DDE0A2C93B5D

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-08-15
External-Description
The shooting of Samuel DuBose occurred during a traffic stop for a missing front license plate on July 19, 2015, in Cincinnati, Ohio. Ray Tensing, a white University of Cincinnati police officer, fatally shot DuBose, a black man, when Dubose started his car and, according to Tensing, began to drive off. This bag contains 696,894 tweets that were sent between July 21 and August 8th with the hashtag #SamuelDubose. Data collection started on July 29 at 17:12:17 as a search and stream.
Size
467.2MB
License
UMD Only

1CA8ACBE-39A6-4CB2-8EFC-0C5014064DD9

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-10-09
External-Description
This bag contains two datasets of tweets that were collected for "hillary" and "trump" from October 9, 2016 9:00 EDT to October 9, 2016 10:30 EDT. This time period covers the duration of their second debate at Washington University in St Louis. The "trump" dataset includes 327,751 tweets and the "hillary" dataset includes 337,320 tweets. According to the logs at least 1,208,536 and 419,544 tweets for the trump and hillary datasets were not delivered due to throttling of the data by Twitter. The datasets were collected using two separate twarc processes that were running on the same m4.xlarge Amazon EC2 instance using different sets of Twitter developer keys. Included in the payload directory are the scripts that were used to start the data collection, their respective log files and the data files themselves.
Size
474.0MB
License
UMD Only

1d7d4ef3-78a9-4ca4-be4a-b67f24acb5f9

Bagging-Date
2015-02-22
External-Description
This is a collection of 1,556,702 tweets generated with twarc for the period 2015-02-12 23:31:56 to 2015-02-28 01:51:52 that mentioned the word "iran" or the hashtags #irantalks and #irandeal.
Size
765.2MB
License
UMD only

232580EC-3762-474E-A78A-0C44D616007A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-08-08
External-Description
['wget capture of http://mith.umd.edu/pda2013/ created by ed on 2016-08-08', 'The bag contains a snapshot of the Personal Digital Archiving 2013 Wordpress\n website. The bag contains the Wordpress source and database in the state it \n was found on August 8, 2016. The bag also includes the output of a wget\n mirror crawl of the website and also a resulting WARC file for the crawl.\n The static site was then used to replace the Wordpress site.']
Size
20.0MB

2543A0EF-3164-48DE-B21C-FE7A5695F62B

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-11-13
External-Description
Created for Daniella Koonce who is a Sociology student studying the 2016 Presidential election. They were collected using the #electionresults hashtag.
Size
95.7MB
License
UMD Only

2654078E-ACC4-408E-93CC-B9320C4A3443

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2021-02-22
External-Description
This bag contains snapshots of the Lakeland Community Heritage Project's Airtable bases as of 2021-02-22. The two primary Airtable bases Lakeland Digital Archive (LDA) and Lakeland Digitization Tracking (LDT) were combined into a new Airtable base to be named Lakeland Digital Archive (LDA2). The snapshots also include LDA-testing, which was used as a scratch space for moving the LDA base forwards. Snapshots were created with https://github.com/simonw/airtable-export LDA is comprised mostly of photos and some oral history interviews that were collected from an Omeka Instance running at lakeland.umd.edu, and from two project members hard drive storage (Mary Sies and Maxine Gross). LDT is comprised of image scans of materials collected during the LCHP's community digitization events in the Fall of 2019. The data files that these Airtable bases described were located on Google Drive, in GitHub repositories, on the MITH Network Attache Storage, and in Amazon S3. Some were also embedded in websites like https://lakeland.umd.edu/asa/ and the Lakeland Omeka site. The LDA and LDT bases were combined into the LDA2 base using bespoke code located at https://github.com/umd-mith/lakeland-data-munging the digital files were also moved in a central location on the filesystem in preparation for them moving to cloud storage (Dropbox). For a discussion of this process see: https://app.asana.com/0/1191235198649641/1198495673312437/f The export of the YAML, JSON and SQLite versions of the three bases was created with the included export.sh. It depends on having an AIRTABLE_KEY set appropriately in the environment.
Size
24.8MB
License
Lakeland Community Heritage Project

28573AC6-11D6-4CF9-91B8-239A19034166

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-04-20
External-Description
This dataset contains tweets that were sent during the Ethics and Archiving the Web conference that was held at the New Museum in New York City, March 22 - 24, 2018. The eaw18.jsonl.gz file contains 3,155 tweets that were collected from Twitter's search API using the utility twarc on March 25th. That file contains tweets that use the #eaw18 hashtag, and includes tweets back to March 15th. On March 27th the twarc utility replies.py was run to collect the threaded conversations around the tweets which resulted in an additional 1,345 tweets. The combined original tweets and replies can be found in the eaw18-replies.jsonl.gz file. More about the conference itself can be found at https://eaw.rhizome.org
Size
3.7MB
License
UMD Only

286971F9-EED7-488D-A6CE-947189A05D36

Contact-Name
Ed Summers
Contact-Email
edsu@umd.edu
Bagging-Date
2018-05-14
External-Description
This is a snapshot of the Storify stories that were created for the African American History, Culture and Digital Humanities project (AADHum) between 2015-2017. They were originally found at https://storify.com/UMD_AADHum but were downloaded using the storified utility https://github.com/docnow/storified just before the service announced it would shut down on May 16.
Size
26.9MB

9B4439F2-9329-4E4D-9F5D-69C470E1C8B9

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-09-26
External-Description
This bag contains a wget capture of http://mith.umd.edu/sharedhorizons/ which includes a mirror copy and a warc file. The wordpress code and database are also saved within the bag.
Size
401.2MB

34322F08-93A8-4309-A3AA-89D20C42B06F

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-08-17
External-Description
This bag contains 8,444,354 tweets mentioning the word Iran in 13 different scripts, collected between July 6 and August 15, 2015. The full list of words was Iran,Иран,Իրան,ﺈﻳﺭﺎﻧ,איראן,İran,ईरान,ইরান,Эрон,อิหร่าน,इरान,이란,Іран They were collected by MITH for Matt Miller of the Roshan Institute at the University of Maryland. The interest in this particular time period is the signing of the Joint Comprehensive Plan of Action between Iran, China, France, Russian, United Kingdom and United States about Iran's nuclear program on July 14, 2015.
Size
4.6GB

36C6BFE6-66A4-4634-9AA9-0D702D9D3B2E

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-09-29
External-Description
These are tweets from @TriciaLockwood that were scraped from Twitter's search on September 29, 2015. The query @TriciaLockwood was used, which will include any tweet sent by or to her, as well as any tweets that mention her handle in the text of the tweet. It is important to remember that Twitter's search does not include retweets. So you will not find @TriciaLockwood's retweets in this dataset. There are 49,573 tweets in JSON format, and at the time she was listed as having 12,438 tweets. The scraping process was a PhantomJS script that exercised the infinite scroll in search results, and extracted the tweet ids. The tweet ids were then hydrated using the twarc tool.
Size
14.7MB
License
UMD Only

37E83ECD-5182-4EA1-8DDA-84D629DB2FBC

Contact-Name
Trevor Muñoz
Contact-Email
tmunoz@umd.edu
Bagging-Date
2017-10-02
External-Description
This bag contains data for the Godwin Diary website at Oxford University http://godwindiary.bodleian.ox.ac.uk It was made available by James Cummings who gave it to Trevor Muñoz at the DH2017 conference in Montreal. The email from Cummings asking Trevor and others at MITH if we wanted a backup copy of the data is included as email.txt in the data directory. From the email it appears that this content is all "openly licensed".
Size
88.7GB

3A292749-2C95-4E49-861A-CD0FFD22B14D

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://www.mith.org/camp/ created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
109.7MB

3DFEA6F3-D830-4C95-852D-9619958627D9

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2019-04-15
External-Description
On March 31, 2019, Hussle was fatally shot outside his store, Marathon Clothing, in South Los Angeles. Eric Holder, a 29-year-old man who had confronted Hussle earlier in the day, was arrested and charged with murder on April 2, 2019. Hussle’s memorial service was held on April 11 at the Staples Center in Los Angeles, with tickets given away free of charge. The 25.5-mile (41.0 km) funeral procession wound through the streets of South L.A. including Watts where he spent some of his formative years. On Wed Apr 03, 2019 tweets with the following hashtags were collected from the Twitter streaming and search APIs: NipseyHussle, Nipsey, Nipsey Hussle, RIPNipseyHussle, RIPNipsey. The collection includes 11,642,103 tweets from March 28 until April 15.
Size
8.7GB
License
UMD Only

3F933CBB-0DCE-425F-86E5-D282C1E09B53

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-08-02
External-Description
On July 5, 2016 Alton Sterling, a 37-year-old black man, was shot several times at close range while held down on the ground by two white Baton Rouge Police Department officers in Baton Rouge, Louisiana. The shooting was recorded by multiple bystanders, which were spread on social media. The shooting led to protests in Baton Rouge and a request for a civil rights investigation by the US Dept of Justice. This bag contains 5,960,419 tweets that used the #AltonSterling hashtag that were collected starting on July 6, 2016 until August 2, 2016. The total tweets includes 1,028,065 tweets that were collected by searching for tweets that had already been sent. The first tweet this search found was sent July 5 at 2:05PM CST.
Size
5.0GB
License
UMD Only

404CC0BB-A5D0-4FBF-8920-F3F0F1BC2CEF

Contact-Name
Porter Olsen
Contact-Email
polsen@umd.edu
Bagging-Date
2015-05-19
External-Description
This bag contains a static website for the Hughes@100 event held at the University of Maryland on February 25, 2002. The original website at http://mith.umd.edu/hughes/ became unavailable, possibly as early as 2003. In May 2015 Porter Olsen recovered the website from an old webserver named Minerva. When the website was put back online the original QuickTime videos of the event were converted to mp4 for accessibility reasons. The original QuickTime files were left in place, along with what appeared to be editor backup files with a tilde extension.
Size
652.0MB

07D2FABF-80D8-45C9-A5E1-3932584CD52B

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-12-11
External-Description
The shooting of Korryn Gaines occurred on August 1, 2016, in Randallstown, Maryland, near Baltimore, resulting in the death of Gaines, a 23-year-old African-American woman, and the shooting of her son. According to the Baltimore County Police Department, officers sought to serve Gaines a warrant in relation to an earlier traffic violation. Upon entering her apartment, an hours-long standoff ensued, ending when Gaines threatened police officers with her shotgun. At least one of the officers shot Gaines, killing her and wounding Gaines' five-year-old son. Portions of the standoff were filmed by Gaines and posted to social-media networking sites; however, upon police request, Facebook deactivated Gaines' Facebook and Instagram accounts, leading to criticisms of the company's involvement in the incident. This bag contains 705,974 tweets from the streaming and search API that used the #KorrynGaines hashtag between August 1 and August 9.
Size
591.9MB
License
UMD Only

424E0234-3975-453E-9307-FE3EBC65560A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2020-10-14
External-Description
This bag contains a wget mirror of the Digital Studies in the Arts and Humanities Wordpress website at https://dsah.umd.edu. It was crawled in October, 2020 when the Wordpress multisite install was turned off. The administration of the dsah.umd.edu domain was passed to Marisa Parham (AADHum Director). The server side Wordpress files and database are part of the Wordpress multisite set up that is saved in bag 8DBEB7E3-72E4-4F0A-80A3-3586D63EEA42. wget needed to be instructed to collect and rewrite resources at mith.umd.edu since the multisite setup used that host for images and css. The wget command looked like this: wget --directory-prefix dsah --output-file wget.log --warc-file dsah --mirror --page-requisites --span-hosts --html-extension --convert-links --execute robots=off --no-parent --exclude-directories example --level 3 --domains dsah.umd.edu,mith.umd.edu https://dsah.umd.edu
Size
183.5MB

4661A68-5404-443E-9571-A9E69F4DBDAE

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-04-21
External-Description
12,994,199 tweets collected for the period of 2015-03-05 to 2014-04-16 for the Roshan Institue at the Universiyt of Maryland. The tweets all contain the word Iran in a set of different scripts, including: Iran,Иран,Իրան,ﺈﻳﺭﺎﻧ,איראן,İran,ईरान,ইরান,Эрон,อิหร่าน,इरान,イ ,이란,Іран
Size
5.8GB
License
UMD only

46D94136-7227-4EAF-8C66-18F62E981AE5

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-10-20
External-Description
This bag contains two datasets of tweets that were collected for "hillary" and "trump" from October 20, 2016 9:00 EDT to October 9, 2016 10:30 EDT. This time period covers the duration of their third debate at the University of Nevada in Las Vegas. The "trump" dataset includes 331,598 tweets and the "hillary" dataset includes 323,816 tweets. According to the logs at least 855,022 and 358,288 tweets for the trump and hillary datasets were not delivered due to throttling of the data by Twitter. The datasets were collected using two separate twarc processes that were running on the same Amazon EC2 instance using different sets of Twitter developer keys. Included in the payload directory are the scripts that were used to start the data collection, their respective log files and the data files themselves.
Size
409.4MB
License
UMD Only

477403F8-C472-4BA9-B8E5-5EB737C23F0C

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-01-24
External-Description
These are tweets that were collected between August 27, 2015 and January 4, 2016 that mention the word "trump". They were collected from Twitter's streaming API. Due to network outages there are gaps between the files. There are 40,202,199 in all.
Size
25.1GB
License
UMD Only

4CEB5DE0-DA52-4127-AE16-64DEBF34170D

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-09-04
External-Description
20,729 #MyAsianAmerican tweets collected between Aug 25 02:24:10 UTC and - Sep 04 19:50:26 2015. The hashtag was first used by Jason Fong, a 15 year old highschool student in Redondo Beach High School who was responding to controversial statements made by Presidential candidate Jeb Bush. This story in the LA Times talked about how the tweet trended that night. http://www.latimes.com/local/lanow/la-high-school-student-myasianamericanstory-anchor-baby-narrative-20150825-htmlstory.html
Size
9.7MB
License
UMD Only

4D41FEA7-9E85-45B8-9499-362212278CAB

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-03-24
External-Description
Data collected from the Twitter filter stream for #blm,#blacklivesmatter between 2016-01-29 and 2017-03-18 using twarc. It includes 17,292,130 tweets. The files are broken into segments because of network connectivity problems, so there are varying time gaps present between the files. Also when the hashtags were trending globally rate limits may have prevented some tweets from being streamed over the API. Data collection stopped on 2017-03-18 because of an authentication error that was the result of the keys having changed. On October 6, 2017 during some data processing of the files it was discovered that due to gzip encoding errors in stream3.json.gz and stream5.json.gz the total count had been undercounted at 13,732,829. The encoding was fixed and the fixities in the manifests were updated.
Size
12.4GB
License
UMD Only

5336FCBD-CDE9-4477-8805-D574D0D99CE5

Bagging-Date
2016-02-01
External-Description
wget capture of http://mith.umd.edu/apiworkshop/ created by Ed Summers on 2016-02-01. The payload includes the following files: - apiworkshop.warc.gz - WARC capture - mith.umd.edu.tar.gz - Static site version - wordpress.tar.gz - Wordpress installation archive - wp_apiworkshop.sql.gz - Wordpress database snapshot
Size
10.1MB

56B1171F-E859-47C0-B3BD-B4D5932B8D4C

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-09-27
External-Description
This bag contains two datasets of tweets that were collected for "hillary" and "trump" from September 26, 2016 22:00 GMT to September 27, 2016 08:00. This time period covers the duration of their first debate at Hofstra University, which started at 9pm EDT and lasted 95 minutes. The "trump" dataset includes 1,636,098 tweets and the "hillary" dataset includes 1,303,084 tweets. According to the logs at least 2,059,946 and 730,512 tweets for the trump and hillary datasets were not delivered due to throttling of the data by Twitter. The datasets were collected using two separate twarc processes that were running on the same m4.xlarge Amazon EC2 instance using different sets of Twitter developer keys. Included in the payload directory are the scripts that were used to start the data collection, their respective log files and the data files themselves.
Size
2.0GB
License
UMD Only

02685C1D-5D87-4F26-8834-02180386523C

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-02-14
External-Description
This is a snapshot of the MITH Forensics Workshop website generated on February 14, 2016 using wget. The included files are: - forensics.sql.gz: a database dump of the WordPress site - mith.umd.edu-forensics.tar.gz: a snapshot of the WordPress installation - mith.umd.edu.tar.gz: a static site generated with wget - forensics.warc.gz: a web archive file generated with wget On testing the website it was discovered that the live site had a broken link for the Stephen Eniss' audio presentation. The correct link was deteremined by looking on the filesystem and noticing that the link contained a typo. It was difficult to fix the link in the WordPress site since it wasn't completely operational. So the link was fixed in the static representation that was mounted on the Web.
Size
674.7MB

5B9BCD78-4DFD-4CFD-A0F1-4C53D0623549

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-04-13
External-Description
This bag contains data from the Early Americas Digital Archive at http://mith.ummd.edu/eada/ The snapshot was created on 2016-04-13. The mirrored content can be found in mith.umd.edu.tar.gz and the corresponding WARC data is in eada.warc.gz. The existing server side data and code can be found in eada.tar.gz and the MySQL database export is in eada.sql.gz.
Size
1.3GB

5CDCC634-09A9-47BE-B8B5-7FB9CDF094BC

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-12-22
External-Description
This bag contains a wget capture of the Society for Textual Scholarship Website that was held at the University of Maryland in May of 2017. The conference website lived at https://mith.umd.edu/sts2017. The website was archived because the conference has passed, and the contents are no longer going to change. The bag contains a WARC file that was generated during the crawl. The static site that was created needed to be modified slightly to get the page header to work correctly. Since the website was part of MITH's multisite installation of Wordpress there was no distinct database or files to archive in addition to the snapshot.
Size
11.7MB

6ABF6C54-A1D0-4AE4-B412-738E482C41A8

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-11-19
External-Description
This dataset of tweets was created in response to the Pittsburgh synagogue shooting that occurred at the Tree of Life Congregation in the Squirrell Hill neighborhood of Pittsburgh, Pennsylvania on October 27, 2018. 3,603,049 tweets were collected from the Twitter Search API for the time period of October 22 to October 30 for all tweets matching any of the following keywords Pittsburgh, pittsburghsynagogueshooting, pittsburghstrong, treeoflifesynagogue, treeoflife, pittsburghsynagogue, strongerthanhate, antisemitism, pittsburghshooting, squirrellhill, treeoflifeshooting.
Size
2.7GB
License
UMD Only

6BB00C6E-B557-478E-947E-04D0E6FFDC8C

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-07-08
External-Description
This package contains 10,406,506 #lovewins tweets that were sent in the aftermath of the Supreme Court's decision in Obergefell v. Hodges that was announced on June 26th, 2015. They cover the period of June 22, 2015 at 03:13:55 to July 02 at 09:03:39. The tweets were collected starting on June 26, 2015 by doing a search of Twitter with twarc, and also setting up a stream capture at the same time. You can see the results of both operations in the two files in the payload.
Size
6.2GB
License
UMD Only

6DC120C7-8901-417B-B387-0C6AFECEE9E8

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-04-08
External-Description
This bag contains 2,033,898 tweets mentioning the word "ferguson" between 2015-02-25 03:34:08 and 2015-03-21 08:27:25. They were collected to help document the reaction to the Investigation of the Ferguson Police Department report that was released by the Department of Justice report on 2015-03-04. Accidentally this time period also included the reaction to two police officers being shot at in Ferguson on 2015-03-12. The data was collected using the twarc tool.
Size
1.1GB
License
UMD only

6E665C92-1FC5-4C45-81AD-26CB2AADB3E4

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://mith.umd.edu/engl668k created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
780.1MB

78B68395-B69C-4084-A66C-B497F36CCD82

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-11-20
External-Description
wget capture of http://mith.umd.edu/diaspora2008/ created by ed on 2016-11-20 also includes server side PHP code and a database snapshot.
Size
18.3MB

7D840688-D48E-4220-9F95-CB9574B72FE0

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-08-30
External-Description
The 2017 solar eclipse occurred on August 21 and and was total for Oregon, Idaho, Wyoming, Nebraska, Kansas, Missouri, Illinois, Kentucky, Tennessee, North Carolina, Georgia, and South Carolina. This bag includes 13,548,321 tweets that included any of the keywords solareclipse2017, solareclipse, eclipse2017, eclipseday or eclipse for the period August 17 to August 23, 2017. The hashtags were were selected after watching Twitter's streaming API for the trending hashtag #solareclipse2017 and counting the most popular co-occurring hashtags. Since data collection via the search API was unlikely to finish within the 7 day window that search results are available, two separate searches were run with twarc starting on August 23. The first (search.jsonl.gz) included tweets that happened since tweet id 899673005755858944. The second (search-maxid.jsonl.gz) includes searches that happened prior to 899673005755858944. The search API was used instead of the streaming API because the streaming API was emitting notifications that many tweets were not delivered, because the volume was so high.
Size
9.5GB
License
UMD Only

7FC1F69F-13C8-4CA3-8919-7C66735DCEC4

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-27
External-Description
This is a collection of tweets using the #SayHerName hashtag. It includes 188,146 tweets for the period of November 6, 2010 to May 27, 2015 that were collected with the twarc tool. The dataset is made up of three different gzipped files of tweets: scraped.json.gz, search.json.gz and stream.json.gz. search.json.gz was collected from the search API starting on May 22, 2015. At the same time data collection fromt the streaming API was started and captured as stream.json.gz. The scraped.json.gz file contains tweets that were scraped from Twitter's search UI for the period prior to where search.json.gz left off. These tweet ids were then rehydrated using twarc. It is important to note that the scraped tweets did not seem to include retweets. While the JSON is similar in structure, the coverage is quite different from search.json.gz and stream.json.gz. The #SayHerName report was released on May 20, 2015 by the African American Policy Forum and the Center for Intersectionality and Social Policy Studies at Columbia University as well as Andrea Ritchie, Soros Justice Fellow. More about the use of this hashtag can be read at http://www.aapf.org/sayhernamereport/ The tweets were collected as part of a collaboration between MITH at the University of Maryland and the University Archives at the University of California at Riverside.
Size
124.8MB
License
UMD and UCR only

5D2787AA-30B3-4C52-8A57-F0D534CF3A6A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-03-26
External-Description
4,328,507 Tweets mentioning the hashtag #tcot collected between 2016-12-01 and 2017-03-26. #tcot is a hashtag for Top Conservatives on Twitter which was found to be a hashtag that was used in right wing responses to the Ferguson uprising on Twitter.
Size
3.2GB
License
UMD Only

8016E9D2-A121-4862-8461-69D558AE035F

Contact-Name
Ed Summers
Contact-Email
edsu@umd.edu
Bagging-Date
2018-04-18
External-Description
This bag includes datasets that were created with the docnow prototype that ran during 2016-2017 at http://app.docnow.io. These datasets were created by Bergis Jules (co-pi) and shared via the DocNow Catalog application as tweet identifier datasets. The underlying JSON data was transferred from UMD's Amazon Web Services EC2 instance where the prototype application was running to Washington University in St Louis in April, 2018 after the conclusion of the initial phase of work on Documenting the Now grant from the Mellon Foundation. Each dataset is listed below which includes the search query that was used, the time it was created, the number of tweets, and the path to filename. The data was collected from the Twitter Search API which provided access to the last 10 days of tweets from the time of the search. query: #blktwitterstorians file: data/201701050445-517854.json.gz created: 2017-01-05 04:45:36 tweets: 371 query: #BLMKidnapping file: data/201701050534-c5bf15.json.gz created: 2017-01-05 05:34:01 tweets: 136990 query: #SaveACA file: data/201701132241-97612e.json.gz created: 2017-01-13 22:41:36 tweets: 137012 query: #blackwomenatwork file: data/201703291343-c266c9.json.gz created: 2017-03-29 13:43:02 tweets: 140000 query: #charlottesville file: data/201708121111-124fac.json.gz created: 2017-08-12 11:11:51 tweets: 100000 query: #BlackTheory file: data/201709200025-e4ce49.json.gz created: 2017-09-20 00:25:03 tweets: 1430 query: #blackdigarchive file: data/201712130923-3c8c3b.json.gz created: 2017-12-13 09:23:06 tweets: 1775 query: #blackdigarchive file: data/201712130923-3c8c3b.json.gz created: 2017-12-13 09:23:06 tweets: 1775 query: #GifHistory file: data/201802120320-715464.json.gz created: 2018-02-12 03:20:30 tweets: 31929
Size
362.2MB

80B7E5EC-A53D-4F2D-8599-9038A84F61DA

Bagging-Date
2016-01-24
External-Description
Tweets mentioning the hashtag #BlackOnCampus between 2015-11-08 and 2015-11-25 collected. Data collection started on 2015-11-12 when a job to collect from the search and streaming APIs were started.
Size
107.0MB
License
UMD only

82DE438D-A361-4BBA-843F-0DB40EEFBB23

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-27
External-Description
This bag contains data from the Foreign Languages in America project that was collected by Peter Mallios of the University of Maryland and given to MITH in May of 2015. The original data consisted of two Box folders which contained TIFF, JPEG, DJVU, OCR and Excel files for image scans that were collected by the FLA team. This data was normalized into a single directory structure for use in a static Jekyll website. The software for doing the normalization is available at: https://github.com/umd-mith/fla-processing
Size
119.3GB

831F8CD-1F7B-42A5-BB10-904FAD15204A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-04-13
External-Description
This bag contains 846,602 tweets mentioning the hashtag #walterscott from 2015-04-01 05:36:27 to 2015-04-13 17:47:09 (UTC). On April 7th, 2015 police officer Michael Slager was arrested and charged with the first degree murder of Walter Scott. A twarc process was started to then to collect tweets using the hashtag, and another process was started to get as many existing #walterscott tweets from the preceeding week.
Size
549.0MB
License
UMD only

87674858-96AB-45F2-90CE-E712F443A658

Bagging-Date
2016-01-24
External-Description
Tweets mentioning #MizzouHungerStrike and #ConcernedStudent1950 between 2015-11-01 and 2015-11-24. These were two hashtags used during the 2015 University of Missouri protests related to race, workplace benefits and leadership that resulted in resignations of the president of the University and the chancellor of the Columbia campus.
Size
198.9MB
License
UMD only

8DBEB7E3-72E4-4F0A-80A3-3586D63EEA42

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2020-10-07
External-Description
This bag contains backups of resources related to the multisite Wordpress instance that ran at mith.umd.edu, and was decommissioned in 2020. Included in this bag are the Wordpress export (export.xml.gz) the Wordpress server side files (mith.umd.edu.tar.gz) a database snapshot (mithpressdelta.sql.gz) and the results of running a wget mirroring operation on the site with warc generation while it was live (static.tar.gz and mith.umd.edu.warc.gz). The multipress Wordpress website was used to manage the mith.umd.edu website and also the aadhum.umd.edu, dsah.umd.edu and guide.dhcuration.org websites.
Size
4.0GB

9065fa97-8ac3-4b11-9703-7cff623c560a

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-02-04
External-Description
This bag contains 6,342,294 tweets collected between October 12 and November 30, 2014 related to Poreshenko and Putin. They were collected by Ed Summers and Tatyana Lockot for a series of articles being written for Global Voices. The tweets were collected from the Twitter filter stream API using twarc, which was configured to retrieve tweets with either of these keywords: Putin, Poroshenko, Путин, Порошенко and Путін.
Size
2.7GB
License
UMD only

94404079-594A-41EF-9A14-4266CC97FFC1

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-10-17
External-Description
This bag contains tweets that were collected between September 19, 2017 and October 5, 2017 that mentioned #CatalanReferendum, #CatalalonianReferendum, #Catalonia, #1oct, #1o or #votarem. These were hashtags used in the lead up to the Catalan Independence Referendum on October 1, 2017. The referendum was declared illegal under Spanish law, and the Spanish police were ordered to prevent it. The hashtags were selected after monitoring the #CatalanReferendum hashtag for several hours on September 28 to determine what the top hashtags were. The tweets themselves were collected from the Twitter Search API using twarc and its twarc-archive utility. twarc-archive was run every hour to collect the tweets that occurred since the last run. The data collection was a collaboration with Vicenç Ruiz Gómez and Aniol Maria of the Society of Catalan Archivists working in conjunction with Ed Summers of MITH.
Size
6.5GB
License
UMD Only

95DDB7B2-E88F-4BB3-AAC5-8CFAD1076AA5

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-06-03
External-Description
4,058,754 tweets collected from the streaming and search APIs using the keyword "gaza" covering the period to 2018-05-08 to 2018-05-19. The stream and stream data collection was started on 2018-05-16. This time period included the opening of the US Embassy in Jerusalem on May 14th. On the same day Israeli forces killed over over 60 Palestinians, and injured 2,700 who were part of a non-violent protest in the Gaza Strip. https://www.democracynow.org/2018/5/24/after_latest_gaza_slaughter_open_an
Size
3.4GB

9707DA4B-6EE8-4ECE-8B24-F8604E8C6A4F

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-11-15
External-Description
2.1 million tweets that used the #NoDAPL or #StandWithStandingRock hashtags over the period of Oct 18 - Nov 7.
Size
1.9GB
License
UMD Only

98EFC987-7C1A-4615-8CD6-836174C6DAF3

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2020-01-19
External-Description
The 1619 Project was developed by The New York Times Magazine in 2019 with the goal of re-examining the legacy of slavery in the United States and timed for the 400th anniversary of the arrival in America of the first enslaved people from West Africa. It is an interactive project by Nikole Hannah-Jones, a reporter for The New York Times, with contributions by the paper's writers, including essays, poems, short fiction, and a photo essay.[1] Originally conceived of as a special issue for August 20, 2019, it was soon turned into a full-fledged project, including a special broadsheet section in the newspaper, live events, and a multi-episode podcast series. (from Wikipedia) This bag contains metadata for tweets related to the hashtag #1619project. They were collected on January 8, 2020 using twint and the keyword 1619project. twint -s '1619project' --csv --output twint.csv twint scrapes Twitter's search results and writes the results as CSV. The tweet identifiers were extracted from this CSV and included as the ids.txt file. On January 19, 2020 the ids.txt was hydrated as tweets.jsonl using the twarc tool. This explains the discrepancy of 443 tweets that were deleted between January 8 and January 19.
Size
248.7MB

9BA6F68E-808E-4D64-A39F-558C4CD92072

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://mith.umd.edu/dccresearch/ created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
45.8MB

A0DADAB6-8C1C-4B6E-A19E-04B5DE839258

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2020-10-14
External-Description
['wget capture of https://guide.dhcuration.org/ created by ed on 2020-10-14', 'This bag contains a wget mirror of the guide.dhcuration.org website that was\n created on October 14, 2020. The site is being moved to the\n archive.mith.umd.edu website as part of the dismantling of the multisite\n Wordpress server at mith.umd.edu. For access to the Wordpress server side files\n and database please see bag 8DBEB7E3-72E4-4F0A-80A3-3586D63EEA42. The static\n site was generated using the following wget command in order to capture and\n rewrite links to other domains:\n wwget --output-file wget.log --warc-file dhdc --mirror --page-requisites\n --span-hosts --html-extension --convert-links --execute robots=off --no-parent\n --domains guide.dhcuration.org,mith.umd.edu,humanitiesdatacurationguide.wordpress.com\n https://guide.dhcuration.org/']
Size
2.1MB

A325B271-260A-4E4C-A8A5-49A88F37BA42

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-12-02
External-Description
27,954,936 tweets collected from Twitter's streaming API with the following query: twarc filter 'female,rage,woman,anger,femalerage,angry,women,feminist' it ran from November 17 to December 2, 2018. The data was collected for Brittany Starr who is a student in the UMD English Department.
Size
23.0GB
License
UMD Only

A36E23C8-45E3-4ECD-8D8E-610CEDF60441

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-08-14
External-Description
This bag contains the Omeka installation and database snapshot for the lakeland.umd.edu website that was obtained from the jarvis.umd.ed Glue host at UMD. The files were copied from /afs/glue.umd.edu/department/oit/aett/otal/server/omeka/lakeland using rsync. The resulting directory was then archived and compressed with tar as lakeland.tar.gz. The Omeka database was configured to talk to a MySQL data named omkealakeland. This database was exported with mysqldump and compressed as omkealakeland.sql.gz The files were obtained as part of a collaboration with Mary Sies of American Studies to put the Lakeland Digital Archive on a firmer footing. Due to staff turnover system level access to jarvis had been lost by Mary's team over the years. Thanks to input from former UMD employee Jill Reese and the help of UMD DIT I was able to get ssh access and locate the filed and database. See this issue ticket for more context: https://umd.service-now.com/itsc?sys_id=7a03b7fe0f94070c7f232ca8b1050e3f&view=sp&id=ticket&table=incident
Size
4.9GB

A43EB791-1B69-413E-BD61-58F79BD9C4CE

Bagging-Date
2018-02-20
External-Description
This is a snapshot of MITH's Digital Dialogue Storify tweets that was collected on February 20, 2018 using the storified tool: https://github.com/docnow/storified Included in each story are the HTML, JSON, and HTML exports that were maded available by Storify before they shuttered the service. The original index.html was renamed to index-original.html and was written to index.html with relative image links that were downloaded.
Size
63.6MB

A5752D17-3670-47BE-AE7B-08E0D3BE7A28

Bagging-Date
2016-01-24
External-Description
3,719,967 tweets mentioning "bowie" between 2016-01-11 and 2016-01-15 Bowie died on January 10, 2015. Data collection started on January 12 when data collection was started from the streaming API and data was collected from the search API. The separate search files are the result of Internal Server errors from Twitter's search API which resulted in a search needing to be started again.
Size
2.2GB
License
UMD only

A9933BF8-CBB3-41EC-B2A8-C7ABDE481B6A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-08-02
External-Description
This bag contains a hydrated version of the Twitter Event Datasets (2012-2016) created by Arkaitz Zubiaga. The original dataset contains 147,055,035 tweet ids from 30 different events that are split into separate files. More about the dataset can be read about at https://doi.org/10.6084/m9.figshare.5100460.v2 The tweet ids were hydrated between June 15 and July 2, 2018 using the twarc utility. Only 86,062,113 tweets were hydrated which is a 42.5% deletion rate. The payload directory includes a hydrate.sh script that was used to do the hydration, as well as the README that was distributed with the original dataset. Finally the wayback directory contains a script to examine links to webarchives in the hydrated JSON data.
Size
59.5GB

A9AE8E15-34AA-45AD-878A-50E5AB745F71

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-26
External-Description
This bag contains a backup of a Wordpress site and database backup for the Humanities Intensive Learning & Teaching website. The site moved from mith.umd.edu/training/ to www.dhtraining.org where it is actively maintained. This backup was crated as part of a cleanup of MITH's main Wordpress host.
Size
10.3MB

AE0A86DE-E17D-438E-BCDF-AA1F04851CAF

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-02-04
External-Description
This bag contains tweets that were hydrated from the Beyond the Hashtags research study conducted by Deen Freelon, Charlton D. McIlwain, and Meredith D. Clark in February of 2016. Their report, which includes details about how this dataset was assembled, is included as a PDF, and more information about the dataaset can also be found at http://cmsimpact.org/resource/beyond-hashtags-ferguson-blacklivesmatter-online-struggle-offline-justice/ In January 2017 Freelon released the 40,815,975 tweet ids for the dataset. http://dfreelon.org/2017/01/03/beyond-the-hashtags-twitter-data/ They were hydrated over the course of a few weeks afterwards using twarc. 34,264,560 (83%) of the original tweets were hydrated.
Size
22.7GB
License
UMD Only

AF330002-664C-4321-98D2-E753BE8DD025

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2014-12-10
External-Description
This Twitter data was collected as part of a partnership between CivicLab, Harvard University and MITH. It represents 15,080,078 tweets that mention "ferguson" for the period between Nov 11 - Dec 8. The twarc utility was used to collect the tweets from the Twitter stream API. Involved individuals included: Greg Coleman, Kim Lamke, Molly Lloyd and Benjamin Sugar.
Size
6.6GB
License
UMD only

B09C1434-FFF9-4C73-B6A3-72FF63036A69

Bagging-Date
2016-02-28
External-Description
wget capture of http://mith.umd.edu/topicmodeling/ created by ed on 2016-02-28 mith.umd.edu-topicmodeling.tar.gz - wordpress snapshot topicmodeling.sql.gz - wordpress database mith.umd.edu.tar.gz - wordpress wget mirror topicmodeling.warc.gz - warc file from the crawl
Size
25.9MB

B436DDA9-FFC1-4BD3-B358-55D56EB9334B

Contact-Name
Stephanie Sapienza
Contact-Email
sapienza@umd.edu
Bagging-Date
2017-09-18
External-Description
This bag contains data dropped off by UMD faculty member Mary Sies for the Lakeland Community Heritage Project. On August 16, 2017 she dropped off a set of DVDs and a hard drive at MITH. Stephanie Sapienza spoke to Mary about what was contained on the DVDs and the hard drives, and then copied them to an external hard disk. The hard disk was then mirrored to Google Drive. In addition they were tarred up and gzipped by Ed Summers to preserve timestamps and compress the data. The goal was to allow the other copies to be manipulated while keeping a copy of what was delivered to MITH for preservation purposes. The two tarballs that are present in the data payload directory of the bag are the result of that archiving. From Stephanie's conversation with Mary It is believed that the media were used by students as they worked with content collected in the community to eventually upload it to the Lakeland Omeka site. A backup of that site is present in s3://mith-bags/A36E23C8-45E3-4ECD-8D8E-610CEDF60441
Size
120.3GB

BDD870A8-587B-419C-A8A7-44D8227ABA29

Bagging-Date
2018-10-07
External-Description
This bag contains the Wordpress server side code and database for the Vintage Computers Omeka site which lived at https://mith.umd.edu/vintage-computers. The data payload also includes the output of mirroring the site with wget, which generated a static website and WARC file. The static version of the website was mounted in place of the live site. The site was archived because it had a custom theme that required significant work to make it work with the latest version of Omeka 2. The site was also no longer being actively developed but still had value as an archive.
Size
462.0MB

B9525BE0-FD7B-41C9-B3B1-F189CB2AD642

Contact-Name
Ed Summers
Contact-Email
edsu@umd.edu
Bagging-Date
2018-07-13
External-Description
This bag contains a wget capture of https://www.digitalmishnah.org/ as well as backups of the server side Wordpress code and database. The Wordpress site was deemed no longer active and rather than folding it into the MITH multisite Wordpress we decided to snapshot it and put it in our static archive.
Size
14.3MB

B9C8B188-5026-4965-9384-605E02FA55E5

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-08-15
External-Description
Sandra Bland was an African-American woman who was found dead in a jail cell in Waller County, Texas, on July 13, 2015. This bag contains 3,805,452 tweets that were sent with the hashtag #SandraBland between July 15 and August 8th. Data collection started on July 17 at 20:13:24 when search and stream twarc jobs were started.
Size
2.6GB
License
UMD Only

BB31CFDF-C413-4BB1-B9EC-3A4A68D274FC

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-08-14
External-Description
The Unite the Right rally was a gathering of far-right white nationalist groups in Charlottesville, Virginia, United States, on August 11 and 12, 2017. Those assembled at the rally included members of white supremacist, white nationalist, alt-right, neo-Confederate, neo-Nazi, and militia movements. The participants were protesting against the removal of Confederate monuments and memorials from public spaces, specifically the Robert Edward Lee Sculpture in Emancipation Park. Hundreds of protesters and counterprotesters were in attendance. There were several violent clashes between protesters and counterprotesters. One protester plowed a car into a crowd of counterprotesters, killing a woman and injuring 19 other people, including five critically. At least 19 people were injured in street brawls and other violence at the rally. This bag contains 6,040,247 tweets mentioning 'charlottesville' collected with the twarc utility. 5,382,975 were collected from Twitter's streaming API, and 657,272 from the search API. The collected tweets range in time from 2017-08-03 17:16:17 to 2017-08-13 23:24:26 GMT. Data collection began 2017-08-12 14:17:06 GMT. The log files for both processes are also included. Since the keyword 'charlottesville' was trending for several days the stream.log file contains information about how many tweets were undelivered.
Size
5.4GB
License
UMD Only

BDD870A8-587B-419C-A8A7-44D8227ABA29

Bagging-Date
2018-10-07
External-Description
This bag contains the Wordpress server side code and database for the Vintage Computers Omeka site which lived at https://mith.umd.edu/vintage-computers. The data payload also includes the output of mirroring the site with wget, which generated a static website and WARC file. The static version of the website was mounted in place of the live site.The site was archived because it had a custom theme that required significant work to make it work with the latest version of Omeka 2. The site was also no longer being actively developed but still had value as an archive. After deployment a problem was discovered in the JavaScript lightroom library, which was not able to load the builder and effects library via the link to scriptaculous.js. In addition the high resolution images in the archive directory that are loaded by lightroom were missing. This problem was fixed by rewriting the scriptaculous links in the items html, and the archive files were copied from the backed up files into the static assets. After this bit of surgery the zooming images worked as the did originally. This bag was then updated with the latest content for the static site.
Size
451.4MB

C2B9AC64-79E7-4EFD-8142-64CB0407E51E

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-06-12
External-Description
This bag contains assets related to the www.soweto76archive.org website that was created by Angel Nieves and Gregory Lord while they were working with MITH at the University of Maryland. In June 2018 the site was archived because it was running a 9 year old version of Wordpress that was not compatible with PHP7 which the rest of MITH was upgrading to. The server side assets (PHP, media files) and MySQL database was archived as wordpress.tar.gz and soweto.sql.gz. The site was then crawled with wget to create a static site as well as a WARC file. The specific wget command can be found in https://github.com/edsu/bagweb The resulting static site was put in place of the Wordpress site, which was decommisioned and sent to Nieves and Lord.
Size
790.9MB

C328DC72-013E-4BB6-AE12-54F075739627

Contact-Name
Ed Summers
Contact-Email
edsu@umd.edu
Bagging-Date
2018-09-05
External-Description
On August 16, 2018 Aretha Franklin died in Detroit, Michigan at the age of 76. Franklin, also known as the Queen of Soul, had an award winning career as a singer, songwriter, actress and pianist while also being described as the voice of the civil rights movement. This bag contains two tweet datasets. The first was collected from the search API during the response to the announcement of her death, which includes tweets from August 8 - August 19 using the query '"Aretha Franklin" OR "Queen of Soul"'. The second dataset was collected over August 24 to September 3, which includes the date of her funeral on August 31. This second dataset was collected using the query '"Aretha Franklin" OR "Queen of Soul" OR ArethaHomegoing OR ArethaFranklinFuneral OR ArethaFranklin' which includes hashtags that were trending at the time. The datasets contain 2,832,128 and 1,332,442 tweets respectively.
Size
3.1GB
License
UMD Only

CA53EE17-91AD-41AC-936D-14C00AFE4EA9

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://mith.umd.edu/digitalstorytelling created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
483.4MB

CC08A512-2A39-4248-B9AD-B07557618837

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-11-21
External-Description
987,938 tweets retrieved mentioning #PuertoRico over the period of October 4 to November 7, 2017. This was a period where there was increased concern being expressed in social media about the response to the humanitarian crisis caused by Hurricane Maria, which made landfalll on September 20. Tweets were collected from the streaming API and the search API. In both cases tweets using #PuertoRico were collected.
Size
740.4MB
License
UMD Only

CD36C7E9-2451-4EB8-BA81-46DC278DC66F

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-20
External-Description
This bag contains 2,195,394 tweets that mention #BaltimoreUprising or #BaltimoreRiots between April 29 and May 14, 2015. They were were collected both from the Twitter search and streaming APIs. This time period saw demonstrations and protests in Baltimore using these two hashtags following the death of Freddie Gray on April 19th, 2015 after his arrest on April 12th.
Size
1.4GB
License
UMD Only

D30ABFEC-C35D-4849-8EA3-94ECE584E552

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-02-02
External-Description
These are files collected from the NEH Funded Projects database for the Office of Digital Humanities https://securegrants.neh.gov/publicquery/Faq.aspx They were collected when it was announced that the Trump administration wanted to defund the NEH http://thehill.com/policy/finance/314991-trump-team-prepares-dramatic-cuts MITH was able to obtain a spreadsheet (Muffin Files.xlsx) which contains the identifiers for each funded project. A program (download.py) was written that downloads the PDF and Excel file for each project and puts it in the white-papers directory. In addition a file data.csv is included which is the combination of all the Excel files as a CSV.
Size
995.8MB

D547F886-D302-4747-B08A-188A645CBFEA

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-01-25
External-Description
This is a collection of tweets related to the 2016 US presidential election, collected over the period of July 13, 2016 to November 10, 2016 by George Washington University. GWU made the collection available as a tweet identifier dataset, which was then hydrated at the University of Maryland over the period of December 1, 2016 and January 2, 2017 using the twarc utility. The original dataset contained 270,189,978 unique twitter identifiers of which 237,651,319 were hydrated (88%). This bag contains the ids that were used for hydration in the ids directory, and the hydrated tweets in the tweets directory. Each id file has a corresponding README that explains how the dataset was created, including what keywords were used to created it. More can be learned about the original dataset at: http://hdl.handle.net/10.7910/DVN/PDI7IN
Size
167.9GB
License
UMD Only

D5707486-0875-4DDF-B7B3-65D20CD4250C

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2019-02-08
External-Description
This dataset was created on February 7, 2019 to document the reaction to Stacy Abrams response to the Presidential State of the Union address delivered on February 5, 2019 from 21:00 to 21:22 PM EST. There are 1,001,590 tweets from January 28, 2019 to February 7, 2019 which were collected from Twitter's search API using twarc and the query: "Stacey Abrams" OR abramsaddress OR staceyabrams OR soturesponse'
Size
682.0MB
License
UMD Only

D651C3F6-5619-4A42-A8BC-7C22B7A9A44A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-20
External-Description
This bag contains 32,056 tweets that mention "ferguson" between August 8 and August 10, 2014. They were collected on May 7th, 2015 using a script that collected Twitter identifiers from the search form on Twitter's website: https://github.com/edsu/twarc/blob/master/utils/discover_ids.py The identifiers were then rehydrated using twarc's --hydrate option. Some important ramifications to be aware of is that the dataset does not include tweets that were deleted before May 7th, 2015 ; and retweets are not included. This datset augments another tweet collection (mith-bag fe28a093-d3f4-42d7-83ba-f5ba1b1cc765) which has a more complete snapshot but is missing tweets just after the killing of Michael Brown on August 9th.
Size
12.2MB
License
UMD Only

D6C8ED6A-13A0-483E-950A-EE6089DFE463

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2020-10-14
External-Description
['wget capture of https://aadhum.umd.edu/ created by ed on 2020-10-14', "This bag contains a wget mirror of the African American History, Culture and\n Digital Humanities (AADHum) Wordpress website at https://aadhum.umd.edu. It was\n crawled in October, 2020 when the Wordpress website was transferred away from\n MITH's AWS infrastructure and to the AADHum project themselves. The\n administration of the aadhum.umd.edu domain was passed to Marisa Parham (AADHum\n Director). The server side Wordpress files and database are part of the\n Wordpress multisite set up that is saved in bag\n 8DBEB7E3-72E4-4F0A-80A3-3586D63EEA42. The mirror copy was created with wget but\n the /events/ path needed to be excluded since it contained a calendar that\n became a crawler trap. In addition wget needed to be instructed to collect and\n rewrite resources at mith.umd.edu since the multisite setup used that host for\n images and css. The wget command looked like this:\n wget --directory-prefix aadhum --output-file wget.log --warc-file aadhum\n --mirror --page-requisites --span-hosts --html-extension --convert-links\n --execute robots=off --no-parent --exclude-directories example --level\n 3 --domains aadhum.umd.edu,mith.umd.edu https://aadhum.umd.edu"]
Size
347.1MB

D7ACD500-FCAF-454E-8B3C-031CB4012145

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-14
External-Description
This bag contains tweets mentioning #CharlieHebdo, #JeSuisCharlie, #JeSuisAhmed and #JeSuisJuif for the period of January 7th to 28th, 2015. They were initially collected by Nick Ruest at York University, who made the tweet ID datasets available http://hdl.handle.net/10864/10830 The data was rehydrated using twarc over the period of February 20th to 24th, 2015. Significant portions of the original data were deleted in the time between when they were tweeted and when they were rehydrated. The original id lists are included along with the hydrated data, which amount to 13 968,293 unique ids.
Size
9.1GB

D8D28FB4-87DC-4DD9-AC82-6260B54AE684

Bagging-Date
2017-04-22
External-Description
This bag contains 10,159,892 tweets and retweets sent by or to jk_rowling between 2015-07-08 and 2017-03-18. The tweets were collected with Social Feed Manager (m5_003). The directory path that SFM stored twitter data was archived and compressed as data/tweets.tar.gz. You will notice on unarchiving that that the archive includes many individual tweet files organized into a directory tree by year, month, day, hour. Each file usually contains 15 minutes of tweets. These files are usually gzip compressed but you will notice that there are a few that are not. You will also notice that a small number of files were not close correctly so you get an error like "unexpected end of file" on reading the end of the file.
Size
4.9GB
License
UMD Only

DAD66855-2344-4261-8688-EADEB3A5EC25

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://mith.umd.edu/eng738T created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
135.3MB

DB7C5ADA-D8AD-4D1B-A389-960EE5A11ADC

Contact-Name
Trevor Muñoz
Contact-Email
tmunoz@umd.edu
Bagging-Date
2019-06-21
External-Description
This bag contains disk images from retired MITH server "zelda." Images were created using the BitCurator software.
Size
205.1GB
License
UMD only

DEF510C5-A888-48D6-BAB2-D1A0040008C4

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-12-11
External-Description
On July 6, 2016, Philando Castile was fatally shot by Jeronimo Yanez, a St. Anthony, Minnesota police officer, after being pulled over in Falcon Heights, a suburb of St. Paul. Castile was driving a car with his girlfriend, Diamond Reynolds, and her four-year-old daughter as passengers when he was pulled over by Yanez and another officer. This bag contains 2,950,803 tweets collected from the search and streaming API for the hashtags #FalconHeightsShooting, #PhilandoCastile and #DiamondReynolds between July 7 and September 9, 2016.
Size
2.1GB
License
UMD Only

E0F66049-106D-472A-8B00-969E1C834993

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-12
External-Description
wget capture of http://mith.umd.edu/musical-theatre/ created by Ed Summers on 2017-06-12. The payload files include the Wordpress code and MySQL database dump as well as a mirror of the website as it existed with the search turned off, and a WARC file.
Size
4.4MB

E4D77D13-3B44-4203-A795-3F950E45F40F

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2016-08-28
External-Description
These are tweets collected during the Documenting the Now meeting held in St Louis on August 22-23. They all use the #docnowcommunity hashtag.
Size
12.6MB
License
UMD Only

E630AB56-C721-4CC5-8663-E049854B7687

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-08-18
External-Description
The Unite the Right rally (also known as the Charlottesville rally) was a protest in Charlottesville, Virginia, United States from August 11–12, 2017, to oppose the removal of a statue of Robert E. Lee in Emancipation Park, which itself was renamed from Lee Park two months earlier. Protesters included white supremacists, white nationalists, neo-Confederates, neo-Nazis, and militias. This bag contains 200,113 tweets collected with the #unitetheright hashtag. Data collection was performed twice from the search API using twarc: once at 2017-08-13 11:46:05 GMT and the other at 2017-08-15 12:03:48 GMT. The second search was run to collect only up to where the first search left off. The time ranges for the tweets are from 2017-08-04 11:44:12 to 2017-08-15 16:03:30 GMT.
Size
155.3MB
License
UMD Only

E6E7A45A-EE6A-4575-ACAE-BD322AE84F87

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-26
External-Description
This bag contains a backup of a Wordpress site and database for the Project Bamboo website that ran at www.projectbamboo.org. It was no longer active at the time of archiving since the DNS record had since been pointed at a Google site. It was archived and removed from MITH's running Wordpress site as part of a clean up project.
Size
35.2MB

E7C141E1-6F3B-48EA-AEE7-57F8BFB06CC8

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-06-26
External-Description
This bag contains a Wordpress site backup and its respective MySQL database for the www.bitcurator.org domain. The domain was no longer registered at the time of the backup but it a Wordpress instance was active at www.bitcurator.net which is a domain managed by the University of North Carolina. It was surmised that the site moved from .org to .net when the project moved to UNC. The Wordpress site was archived as part of a clean up of the main MITH Wordpress host.
Size
221.3MB

E967BD40-FC32-477F-9E5E-92C61B22807A

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2015-05-20
External-Description
This is a dataset of tweets collected during April 15, 2015 and May 13, 2015 that mention the hashtag #FreddieGray. One portion of the tweets was collected from Twitter's search API and the other set is from the streaming API. Both sets were collected using the twarc tool. The total dataset includes 2,983,934 tweets. Freddie Gray was an African-American man who was arrested by the Baltimore Police Department on April 12, 2015, and died on April 19, 2015 due to an injury to his spinal cord that was believed to be the result of his treatment by the police.
Size
1.8GB
License
UMD Only

F0A98C71-8BF6-42FC-B476-015E21A84CAD

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-05-17
External-Description
782,509 tweets including the hashtag #macronleaks or #macrongate that were collected between 2017-05-10 16:14:51 and 2017-05-02 07:02:05 UTC. The tweets were collected from the Twitter Search API using twarc. The data does not include the first use of the #macrongate hashtag, but it does include the first use of the #macronleaks hashtag which went viral after Wikileaks published it. More about the story of the #marconleaks hashtag can be found at: http://www.newyorker.com/news/news-desk/the-far-right-american-nationalist-who-tweeted-macronleaks
Size
580.7MB
License
UMD Only

F1EBD541-82F7-4DC9-A7CC-60C9DE94E8F2

Contact-Name
Trevor Muñoz
Contact-Email
tmunoz@umd.edu
Bagging-Date
2019-06-12
External-Description
These were interviews conducted by MITH and collaborators with members of the Lakeland community. The activities were conducted with support from a Community Partnership Grant from the American Studies Association. https://www.theasa.net/awards/grants/community-partnership-grants The interviews are MP3 and SRT transcript files that were created by uploading original recordings to the otter.ai service.
Size
109.4MB

F4C3C4C6-EFE0-4712-BA93-B2948D3D66E3

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2018-02-27
External-Description
On February 27, 2018 the National Museum of African American History and Culture hosted a Twitter chat with the Documenting the Now project. Bergis Jules from the Documenting the Now team coordinated the project responses and delivered them as the @documentnow user on Twitter. Other people from the project and elsewehere responded. These event started at 9:30 AM EST and finished at 10:30 AM EST. Tweets with the designated hashtag #ArchivesBlackHistory were collected using twarc at 11:30 AM on Feburary 27, 2018. It collected 1402 tweets, some of which were created prior to the twitter chat, since it has been used in other promotional outreach by the NMAAHC.
Size
9.8MB

FE0207E7-E21E-41F8-8A05-1F11BC68CFF8

Bagging-Date
2015-10-19
External-Description
On Friday, June 5, 2015, at a pool party in McKinney, Texas, a police officer was video-recorded restraining an unarmed African-American fifteen-year-old girl on the ground. He later drew his handgun during the same incident. This bag contains 180,000 tweets containing the hashtag #McKinney that were collected between 20:15:53 and 23:46:26 on June 7, 2015. They were collected by Bergis Jules at the University of California at Riverside in collaboration with MITH.
Size
124.1MB
License
UMD Only

FE3814C1-54A3-46BB-8093-3A90D81AF928

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2017-03-26
External-Description
This bag contains 2,711,011 tweets collected from the Twitter filter stream between 2017-02-09 and 2017-03-18 that used any of the following hashtags: alternativefacts, fakenews, truthiness, postfact, posttruth, factcheck. They were collected as a research experiment for Damien Smith Pfister in the Department of Communication.
Size
2.1GB
License
UMD Only

fe28a093-d3f4-42d7-83ba-f5ba1b1cc765

Contact-Name
Ed Summers
Contact-Email
ehs@pobox.com
Bagging-Date
2014-08-30
External-Description
A collection of 13,238,863 tweets mentioning 'ferguson' from 2014-08-10 22:44:43 to 2014-08-27 15:15:50. The tweets were collected from the Twitter Search API using the twarc utility. They were subsequently run through deduplication process and also a URL unshortening process that added the unshortened_url key to url entities in the original json data.
Size
8.4GB
License
UMD only