Corpora from the Web

COW – Corpora from the Web

Project Description

COW is an initiative by Felix Bildhauer and Roland Schäfer of the German Grammar Group at Freie Universität Berlin. Since 2011, we have been building linguistically annotated corpora containing tens of billions of tokens in Dutch, English, French, German, Spanish, Swedish. From 2015 to 2017, Roland Schäfer's work is supported by the German Research Council (Deutsche Forschungsgemeinschaft, DFG) through grant SCHA1916/1-1 "Linguistische Webcharakterisierung und Webkorpuserstellung".

Up-to-date information can be found on our WordPress site at

Corpus acces and Creative Commons data

Corpus access is available to people working in the academia. However, it is highly restricted for legal reasons, and we provide access to the corpora through a custom interface (which requires registration) at Several derived data sets can be downloaded and used under a permissive CC-BY license, however:


Corpora from the Web
Arbeitsgruppe Deutsche Grammatik
Deutsche und niederländische Philologie
Freie Universität Berlin
Habelschwerdter Allee 45
14195 Berlin
Email: Felix Bildhauer, Roland Schäfer
Telefon +49-30-838 51374

Last modified: May 29, 2015