Datasets: Temporal Evolution of Templates on Wikipedia
This work has constituted the B.Sc. thesis of Mattia Lago and has been supervised by prof. Alberto Montresor. We analyzed the temporal evolution of templates in the Italian and English language Wikipedia counting how the number of occurences of templates changed over time.
The code is available under the MIT license on GitHub.
Italian Wikipedia (itwiki)
These datasets were produced analyzing the Italian Wikipedia dump with complete page edit history in .bz2 format of 2015-10-20.
- template_count_it.tar.7z (544MB compressed, 9.0GB uncompressed, md5sum:
57ff71be1e81ce069bf6407596ff23e7). This dataset consists in the count of the appeareance of each template for each revision in Italian Wikipedia. The archive contains a CSV file with the following fields:
page_id: (numerical) identifier of the page
page_tile: page title
rev_id: (numerical) identifier of the article revision
timestamp: revision timestamp
dictionary: a (Python) dictionary containing the count of the templates appearing in that given revision. Keys are the name of the templates, values are the counts.
- redirects_it.tar.7z (74KB compressed, 257K uncompressed, md5sum:
4ccaca5cc86657f3a36cb6f974d13a61). This dataset consists in a list of redirects for each template in Italian Wikipedia. The archive contains a CSV file with the following fields:
template: template name
redirect: destination of the redirect
rev_id: (numerical) identifier of the page revision
timestamp: revision timestamp.
- Download the md5sum file.
The code is released under the MIT license and it is available on GitHub. The dataset have been extracted from Wikipedia dumps and have the same license (CC-BY-SA 2.5).
How to cite
If you reuse this dataset, please cite it as:
For further info send me an e-mail.