Search Store

Open datasets

A collection of open datasets from international public sources

9 results

9 listings

ResultCost

England and Wales Baby Names
Cost
Free
A CSV package converted from the 2024 ONS boys and girls baby-name workbooks, preserving every worksheet.
Datasetsone-stop
Cost
Free
Historical U.S. Census Names and Surnames
Cost
Free
A normalized CSV package for historical Census surname data plus 1990 first-name and last-name frequency files.
Datasetsone-stop
Cost
Free
2020 Census First Names
Cost
Free
A CSV package converted from the Census 2020 first-name workbooks, preserving each source worksheet as a CSV table.
Datasetsone-stop
Cost
Free
wordfreq
Cost
Free
Word frequency data package for text processing and linguistic analysis. Useful for building language models, text scoring, and related tasks. Pack includes multiple files and metadata, with various extensions and notes on attribution. Details: 207 files totaling 60,828,912 bytes across formats such as gz, py, md, txt, ini, toml, and more. Sample files and notices provided for context and attribution. Source: rspeer/
Datasetsone-stop
Cost
Free
CMUdict
Cost
Free
CMUdict is a US English pronouncing dictionary. It is useful for linguistic processing tasks such as phonetic transcription and pronunciation modeling. Details: 16 files totaling 3,664,918 bytes across multiple extensions, including .dict, .phones, .symbols, and .py scripts. Sample files are provided to illustrate structure and usage. Source: cmudict repository (cmusphinx/cmudict).
Datasetsone-stop
Cost
Free
stopwords-json
Cost
Free
Stopwords-json provides stopword lists for 50 languages in JSON and TXT formats, sourced from the repository stopwords-json. Useful for text processing tasks such as tokenization, filtering, and language-aware preprocessing.
Datasetsone-stop
Cost
Free
Wordnik Wordlist
Cost
Free
Wordnik Wordlist is an open-source English wordlist provided by Wordnik. It is useful for game development and other word-based projects.
Datasetsone-stop
Cost
Free
Alir3z4 stop-words
Cost
Free
A collection of common stop words across languages, useful for text processing tasks such as filtering out common words during analysis.
Datasetsone-stop
Cost
Free
2020 Census Last Names
Cost
Free
A CSV package converted from the Census 2020 last-name workbooks, preserving each source worksheet as a CSV table.
Datasetsone-stop
Cost
Free

Open datasets

England and Wales Baby Names

Historical U.S. Census Names and Surnames

2020 Census First Names

wordfreq

CMUdict

stopwords-json

Wordnik Wordlist

Alir3z4 stop-words

2020 Census Last Names

England and Wales Baby Names

Historical U.S. Census Names and Surnames

2020 Census First Names

wordfreq

CMUdict

stopwords-json

Wordnik Wordlist

Alir3z4 stop-words

2020 Census Last Names