Skip to main content
Restricted data (SFU, UBC, UNBC, UVic)
Corpus of Contemporary American English (COCA)
Version 1.0
Corpus of Contemporary American English (COCA)
Davies, Mark, 2022, "Corpus of Contemporary American English (COCA)", https://hdl.handle.net/11272.1/AB2/3AKAN0, Abacus Data Network, V1
Dataset Metrics
68 Downloads
Table
Tree
Preview
Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Files and Add Dataset Terms of Access

Restricting limits access to published files. You can add or edit Terms of Access for the dataset, and allow people to Request Access to restricted files.

Enable access request
Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.


Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.
Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 4.0 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details
 
Version:
Last Updated:
Version:
Last Updated:
Select File(s)

Please select a file or files for access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

  You need to Log In to request access.

???file.mapData.unpublished.header???

???file.mapData.unpublished.message???

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Restrictions on use of the English Corpora materials

You must agree to these restrictions in order to obtain the data

  1. In no case can substantial amounts of the full-text data (typically, a total of 50,000 words or more) be distributed outside the organization. For example, you cannot create a large word list or set of n-grams, and then distribute this to others, and you could not copy 70,000 words from different texts and then place this on a website where users from outside your organization would have access to the data.

  2. If portions of the derived data is made available to others, it cannot include substantial portions of the the raw frequency of words (e.g. the word occurs 3,403 times in the corpus) or the rank order (e.g. it is the 304th most common words). (Note: it is acceptable to use the frequency data to place words and phrases in “frequency bands”, e.g. words 1-1000, 1001-3000, 3001-10,000, etc. However, there should not be more than about 20 frequency bands in your application.)

  3. You can not use the data to create software or products that will be sold to others.

  4. Students in undergraduate classes cannot have access to substantial portions of the data (e.g. 50,000 words or more). Graduate students can have access to the data for work on theses and dissertations. The data is primarily intended for use in research, not teaching. If you need corpus data for undergraduate classes, please use the standard web interface for the corpora at https://www.english-corpora.org/

  5. Any publications or products that are based on the data should contain a reference to the source of the data: https://www.corpusdata.org.

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

https://abacus.library.ubc.ca/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

Restrictions on use of the English Corpora materials

You must agree to these restrictions in order to obtain the data

  1. In no case can substantial amounts of the full-text data (typically, a total of 50,000 words or more) be distributed outside the organization. For example, you cannot create a large word list or set of n-grams, and then distribute this to others, and you could not copy 70,000 words from different texts and then place this on a website where users from outside your organization would have access to the data.

  2. If portions of the derived data is made available to others, it cannot include substantial portions of the the raw frequency of words (e.g. the word occurs 3,403 times in the corpus) or the rank order (e.g. it is the 304th most common words). (Note: it is acceptable to use the frequency data to place words and phrases in “frequency bands”, e.g. words 1-1000, 1001-3000, 3001-10,000, etc. However, there should not be more than about 20 frequency bands in your application.)

  3. You can not use the data to create software or products that will be sold to others.

  4. Students in undergraduate classes cannot have access to substantial portions of the data (e.g. 50,000 words or more). Graduate students can have access to the data for work on theses and dissertations. The data is primarily intended for use in research, not teaching. If you need corpus data for undergraduate classes, please use the standard web interface for the corpora at https://www.english-corpora.org/

  5. Any publications or products that are based on the data should contain a reference to the source of the data: https://www.corpusdata.org.

Compute Batch
Clear Batch
Dataset Dataset Persistent ID
Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Publish Dataset

This dataset cannot be published until Restricted data is published by its administrator.

Publish Dataset

This dataset cannot be published until Restricted data and Abacus Data Network are published.

Return to Author

Return this dataset to contributor for modification.

Contact Abacus Data Network Support

Abacus Data Network Support

Please fill this out to prove you are not a robot.

+ =