Skip to contents

Dataset to use the import_subs function to import subtitle fastText model outputs and frequency counts. Includes information about matching `udpipe` models for tagging.

Usage

data(subsData)

Format

A data frame of links and information about the subs2vec project.

language_code

the two letter language code of the model

subs_vec

a link to download the subtitle only fastText model

subs_count

a link to download the frequencies for the tokens in the subtitle data

wiki_vec

a link to download the wikipedia only fastText model

wiki_count

a link to download the frequencies for the tokens in the wikipedia data

files

the number of files in the OpenSubtitles data

tokens

the number of tokens in the OpenSubtitles data

sentences

the number of sentences in the OpenSubtitles data

language

the full name of the language for reference

udpipe_model

the matching `udpipe` model for download to parse tokens