Cristian Consonni bio photo

Cristian Consonni

Ph.D. in Computer Science, free software activist, physicist and storyteller

Email Twitter Facebook LinkedIn Github Stackoverflow keybase

Datasets

Here you can find some datasets you can reuse for your research.

WikiLinkGraphs

This dataset contains wikilinks, i.e. links between Wikipedia articles, extracted by processing each revision of each Wikipedia article (namespace 0) from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…

This dataset contains wikilink snapshots, i.e. links between Wikipedia articles, extracted by processing each revision of each Wikipedia article from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. The snapshots were taken on March 1st, for the years between 2001 and 2018 (included). Read more and download the data…

WikiLinkGraphs’ RevisionList

This dataset contains lists of all revisions for each Wikipedia article (namespace 0) from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…

WikiLinkGraphs’ Snapshots

This dataset contains wikilink snapshots, i.e. links between Wikipedia articles, extracted by processing each revision of each Wikipedia article (namespace 0) from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. The snapshots were taken on March 1st, for the years between 2001 and 2018 (included). Read more and download the data…

WikiLinkGraphs’ Redirects

This dataset contains redirects in Wikipedia, i.e. alias names for Wikipedia articles, extracted by processing Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…

WikiLinkGraphs’ ResolvedRedirects

This dataset contains Wikipedia snapshots with resolved redirects, i.e. list of pages (with a particular revision) of Wikipedia on March, 1st for each year from 2001 to 2018 (included), with redirects indicating which page was pointed at the moment. It has been produced by processing Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…

WikiLinkGraphs

WikiLinkGraphs is a dataset of the network of internal Wikipedia links for 9 language editions: de, en, es, fr, it, nl, pl, ru, sv. This dataset spans over 17 years, from the creation of Wikipedia in 2001 to March 2018. The dataset has been produced by processing Wikimedia’s history dumps. Read more and download the data…

Wikipedia pagecounts

Wikipedia pagecounts sorted by page (year 2014)

This dataset is supersed by Wikipedia pagecounts-raw sorted by page (years 2007-2016) This dataset contains the page view statistics for all the Wikimedia projects in the year 2014, ordered by (project, page, timestamp). It has been generated starting from the Wikimedia’s pagecounts-raw dataset. Read more and download the data…

Wikipedia pagecounts-raw sorted by page (years 2007 – 2016)

This dataset consists of hourly pagecounts for Wikipedia pages sorted by article, ordered by (project, page, timestamp). It has been created by processing Wikimedia’s pagecounts-raw dataset. Read more and download the data…

Wikipedia pagecounts-ez (2007-12-09 – 2011-11-15)

This dataset is a compressed format of the pageview data of Wikimedia projects. It has been created by processing Wikimedia’s pagecounts-raw dataset. Read more and download the data…

Wikipedia pagecounts-all-sites sorted by page (years 2014 – 2016)

This dataset consists of hourly pagecounts for Wikipedia pages sorted by article, ordered by (project, page, timestamp). It has been created by processing Wikimedia’s pagecounts-all-sites dataset. Read more and download the data…

Wikipedia templates

Temporal evolution of templates on Wikipedia

This work has constituted the B.Sc. thesis of Mattia Lago and has been supervised by prof. Alberto Montresor. We analyzed the temporal evolution of templates in the Italian and English language Wikipedia counting how the number of occurences of templates changed over time. Read more and download the data…


Questions?

For further info send me an e-mail.