Datasets: Temporal Evolution of Templates on Wikipedia
This work has constituted the B.Sc. thesis of Mattia Lago and has been supervised by prof. Alberto Montresor. We analyzed the temporal evolution of templates in the Italian and English language Wikipedia counting how the number of occurences of templates changed over time.
Code
The code is available under the MIT license on GitHub.
Italian Wikipedia (itwiki)
These datasets were produced analyzing the Italian Wikipedia dump with complete page edit history in .bz2 format of 2015-10-20.
- template_count_it.tar.7z (544MB compressed, 9.0GB uncompressed, md5sum:
57ff71be1e81ce069bf6407596ff23e7
). This dataset consists in the count of the appeareance of each template for each revision in Italian Wikipedia. The archive contains a CSV file with the following fields:page_id
: (numerical) identifier of the pagepage_tile
: page titlerev_id
: (numerical) identifier of the article revisiontimestamp
: revision timestampdictionary
: a (Python) dictionary containing the count of the templates appearing in that given revision. Keys are the name of the templates, values are the counts.
- redirects_it.tar.7z (74KB compressed, 257K uncompressed, md5sum:
4ccaca5cc86657f3a36cb6f974d13a61
). This dataset consists in a list of redirects for each template in Italian Wikipedia. The archive contains a CSV file with the following fields:template
: template nameredirect
: destination of the redirectrev_id
: (numerical) identifier of the page revisiontimestamp
: revision timestamp.
- Download the md5sum file.
License
The code is released under the MIT license and it is available on GitHub. The dataset have been extracted from Wikipedia dumps and have the same license (CC-BY-SA 2.5).
How to cite
If you reuse this dataset, please cite it as:
Mattia Lago, Cristian Consonni, Alberto Montresor. Temporal evolution of templates on Wikipedia. (Cite using WebCite®, cite using perma.cc/VR45-24JP)
Questions?
For further info send me an e-mail.