The Corpus of Contemporary American English (COCA) contains about 1 billion words in nearly 500,000 texts from 1990 to 2019 -- which are nearly evenly divided between spoken, fiction, magazines, newspapers, academic journals, blogs, other web pages, and TV/Movie subtitles (120-130 million words in each genre). In addition, there are 20 million words each year from 1990-2019 (with the same genre balance each year).
From the COCA website:"The Corpus of Contemporary American English (COCA) is the only large and 'representative' corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created. These corpora were formerly known as the 'BYU Corpora', and they offer unparalleled insight into variation in English. (https://www.english-corpora.org/coca/)
Access to English Corpora data in Abacus is restricted to researchers currently affiliated with UBC. Authorized users may use the data solely for non-profit academic research purposes. To request access please use the "Contact Owner" button and include your university affiliation.
Metadata from corpora web site: https://www.english-corpora.org/coca/. [2 Sept 2022]
Use email button above to contact.
Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation above, generated by the Dataverse.
No waiver has been selected for this dataset.
You must agree to these restrictions in order to obtain the data
In no case can substantial amounts of the full-text data (typically, a total of 50,000 words or more) be distributed outside the organization. For example, you cannot create a large word list or set of n-grams, and then distribute this to others, and you could not copy 70,000 words from different texts and then place this on a website where users from outside your organization would have access to the data.
If portions of the derived data is made available to others, it cannot include substantial portions of the the raw frequency of words (e.g. the word occurs 3,403 times in the corpus) or the rank order (e.g. it is the 304th most common words). (Note: it is acceptable to use the frequency data to place words and phrases in “frequency bands”, e.g. words 1-1000, 1001-3000, 3001-10,000, etc. However, there should not be more than about 20 frequency bands in your application.)
You can not use the data to create software or products that will be sold to others.
Students in undergraduate classes cannot have access to substantial portions of the data (e.g. 50,000 words or more). Graduate students can have access to the data for work on theses and dissertations. The data is primarily intended for use in research, not teaching. If you need corpus data for undergraduate classes, please use the standard web interface for the corpora at https://www.english-corpora.org/
Any publications or products that are based on the data should contain a reference to the source of the data: https://www.corpusdata.org.
The following guestbook will prompt a user to provide additional information when downloading a file.
Restricted dataset
This file has already been deleted (or replaced) in the current version. It may not be edited.
Restricting limits access to published files. You can add or edit Terms of Access for the dataset, and allow people to Request Access to restricted files.
The file will be deleted after you click on the Delete button.
Files will not be removed from previously published versions of the dataset.
Please select one or more files.
Share this dataset on your favorite social media networks.
Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.
The restricted file(s) selected may not be downloaded because you have not been granted access.
The files selected are too large to download as a ZIP.
You can select individual files that are below the 4.0 GB download limit from the files table, or use the Data Access API for programmatic access to the files.
Please select a file or files to be downloaded.
Click Continue to download the files you have access to download.
Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.
Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.
Private URL can only be used with unpublished versions of datasets.
Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.
The file(s) will be deleted after you click on the Delete button.
This dataset contains restricted files you may not compute on because you have not been granted access.
Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.
Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.
Please select two versions to view the differences.
Please select a file or files for access request.
Select existing file tags or create new tags to describe your files. Each file can have more than one tag.
You need to Log In to request access.
???file.mapData.unpublished.message???
Please confirm and/or complete the information needed below in order to continue.
Upon downloading files the guestbook asks for the following information.
Account Information
Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL
https://abacus.library.ubc.ca/api/access/datafile/
Please confirm and/or complete the information needed below in order to request access to files in this dataset.
You will not be able to make changes to this dataset while it is in review.
Are you sure you want to republish this dataset?
Select if this is a minor or major version update.
This dataset cannot be published until Restricted data is published by its administrator.
This dataset cannot be published until Restricted data and Abacus Data Network are published.
Return this dataset to contributor for modification.
Abacus Data Network Support
Please fill this out to prove you are not a robot.