subs2vec Project Model and Frequency Data Downloads

Dataset to use the import_subs function to import subtitle fastText model outputs and frequency counts. Includes information about matching `udpipe` models for tagging.

Usage

data(subsData)

Format

A data frame of links and information about the subs2vec project.

language_code: the two letter language code of the model
subs_vec: a link to download the subtitle only fastText model
subs_count: a link to download the frequencies for the tokens in the subtitle data
wiki_vec: a link to download the wikipedia only fastText model
wiki_count: a link to download the frequencies for the tokens in the wikipedia data
files: the number of files in the OpenSubtitles data
tokens: the number of tokens in the OpenSubtitles data
sentences: the number of sentences in the OpenSubtitles data
language: the full name of the language for reference
udpipe_model: the matching `udpipe` model for download to parse tokens