Datasets
Here you can find some datasets you can reuse for your research.
WikiLinkGraphs
WikiLinkGraphs’ RawWikilinks
This dataset contains wikilinks, i.e. links between Wikipedia articles, extracted by processing each revision of each Wikipedia article (namespace 0
) from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…
WikiLinkGraphs’ RawWikilinks Snapshots
This dataset contains wikilink snapshots, i.e. links between Wikipedia articles, extracted by processing each revision of each Wikipedia article from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. The snapshots were taken on March 1st, for the years between 2001 and 2018 (included). Read more and download the data…
WikiLinkGraphs’ RevisionList
This dataset contains lists of all revisions for each Wikipedia article (namespace 0
) from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…
WikiLinkGraphs’ Snapshots
This dataset contains wikilink snapshots, i.e. links between Wikipedia articles, extracted by processing each revision of each Wikipedia article (namespace 0
) from Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. The snapshots were taken on March 1st, for the years between 2001 and 2018 (included). Read more and download the data…
WikiLinkGraphs’ Redirects
This dataset contains redirects in Wikipedia, i.e. alias names for Wikipedia articles, extracted by processing Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…
WikiLinkGraphs’ ResolvedRedirects
This dataset contains Wikipedia snapshots with resolved redirects, i.e. list of pages (with a particular revision) of Wikipedia on March, 1st for each year from 2001 to 2018 (included), with redirects indicating which page was pointed at the moment. It has been produced by processing Wikimedia’s history dumps for the languages de, en, es, fr, it, nl, pl, ru, sv. Read more and download the data…
WikiLinkGraphs
WikiLinkGraphs is a dataset of the network of internal Wikipedia links for 9 language editions: de, en, es, fr, it, nl, pl, ru, sv. This dataset spans over 17 years, from the creation of Wikipedia in 2001 to March 2018. The dataset has been produced by processing Wikimedia’s history dumps. Read more and download the data…
Wikipedia pagecounts
Wikipedia pagecounts sorted by page (year 2014)
This dataset is supersed by Wikipedia pagecounts-raw sorted by page (years 2007-2016)
This dataset contains the page view statistics for all the Wikimedia projects in the year 2014, ordered by (project, page, timestamp)
. It has been generated starting from the Wikimedia’s pagecounts-raw
dataset.
Read more and download the data…
Wikipedia pagecounts-raw
sorted by page (years 2007 – 2016)
This dataset consists of hourly pagecounts for Wikipedia pages sorted by article, ordered by (project, page, timestamp)
. It has been created by processing Wikimedia’s pagecounts-raw
dataset. Read more and download the data…
Wikipedia pagecounts-ez
(2007-12-09 – 2011-11-15)
This dataset is a compressed format of the pageview data of Wikimedia projects. It has been created by processing Wikimedia’s pagecounts-raw
dataset. Read more and download the data…
Wikipedia pagecounts-all-sites
sorted by page (years 2014 – 2016)
This dataset consists of hourly pagecounts for Wikipedia pages sorted by article, ordered by (project, page, timestamp)
. It has been created by processing Wikimedia’s pagecounts-all-sites
dataset. Read more and download the data…
Wikipedia templates
Temporal evolution of templates on Wikipedia
This work has constituted the B.Sc. thesis of Mattia Lago and has been supervised by prof. Alberto Montresor. We analyzed the temporal evolution of templates in the Italian and English language Wikipedia counting how the number of occurences of templates changed over time. Read more and download the data…
Questions?
For further info send me an e-mail.