Data for Wordpiece-Style Tokenization [R package wordpiece.data version 2.0.0]

wordpiece.data: Data for Wordpiece-Style Tokenization

Provides data to be used by the wordpiece algorithm in order to tokenize text into somewhat meaningful chunks. Included vocabularies were retrieved from <https://huggingface.co/bert-base-cased/resolve/main/vocab.txt> and <https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt> and parsed into an R-friendly format.

Version:	2.0.0
Depends:	R (≥ 3.5.0)
Suggests:	testthat (≥ 3.0.0)
Published:	2022-03-03
DOI:	10.32614/CRAN.package.wordpiece.data
Author:	Jonathan Bratt [aut], Jon Harmon [aut, cre], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph], Google, Inc [cph] (original BERT vocabularies)
Maintainer:	Jon Harmon <jonthegeek at gmail.com>
BugReports:	https://github.com/macmillancontentscience/wordpiece.data/issues
License:	Apache License (≥ 2)
URL:	https://github.com/macmillancontentscience/wordpiece.data
NeedsCompilation:	no
Materials:	README, NEWS
CRAN checks:	wordpiece.data results

Reference manual:

Package source:	wordpiece.data_2.0.0.tar.gz
Windows binaries:	r-devel: wordpiece.data_2.0.0.zip, r-release: wordpiece.data_2.0.0.zip, r-oldrel: wordpiece.data_2.0.0.zip
macOS binaries:	r-release (arm64): wordpiece.data_2.0.0.tgz, r-oldrel (arm64): wordpiece.data_2.0.0.tgz, r-release (x86_64): wordpiece.data_2.0.0.tgz, r-oldrel (x86_64): wordpiece.data_2.0.0.tgz
Old sources:	wordpiece.data archive

Reverse imports:

Please use the canonical form https://CRAN.R-project.org/package=wordpiece.data to link to this page.