Description
Replication data for “The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence from Peer Production” by Benjamin Mako Hill and Aaron Shaw to be published in Communication Research. Replicating the analysis presented in the paper Replicating this analysis involves a number of steps. We have attempted to included the most “raw” versions of the data to allow replication of our full data pipeline. This includes three sources of data: MediaWiki XML dump files: We have included full XML history data for all wikis we have access to (including those excluded from our final analysis) in XZ compressed MediaWiki dump XML format. This includes more than 330GB uncompressed. All of these files are made available in the following GNU Tar archive: hidden_costs-wiki_xml_dumps.tar Deployment Dates: A file with a list of the dates on which Wikia wikis included in our analysis transitioned to requiring accounts from would-be editors. We received this list from Wikia staff/administrators and it is provided in the following file: hidden_costs-login_only_wikis.csv Data on administrators: Because data on user rights (which user accounts have administrative rights for a given wiki) are not included in the XML dumps, we had to collect this data after the fact via the Wikia API. We have included the user rights data we collected as well as the code used to collect it in the following file: hidden_costs-admin_list.tar.xz In addition to the code included in this archive, you will need access to a Python program wikiq which is a tool created by the Community Data Science Collective to parse the MediaWiki XML dump files included in this dataset. The output of wikiq is a set of TSV data with revision metadata which is used by the rest of this analysis. The wikiq code is available here: https://code.communitydata.science/mediawiki_dump_tools.git License The documentation provided for this project is released under a Creative Commons Attribution Share-Alike 4.0 license (CC-BY-SA 4.0). Details of the license are available at: http://creativecommons.org/licenses/by-sa/4.0/. All code provided for this project is released under the GNU GPLv3 (Available in plain text here:https://www.gnu.org/licenses/gpl-3.0.txt). The data are collected from Wikia.com (now largely rebranded Fandom.com). Most or all of these data have been published as free cultural works by Wikia/Fandom under the CC-BY-SA 3.0 (unported) license. Details on Wikia/Fandom licensing is available on this page: https://www.fandom.com/licensing Contact Please be in touch with the authors with any questions. Benjamin Mako Hill Aaron Shaw
Date made available | 2020 |
---|---|
Publisher | Harvard Dataverse |