huggingface/transformers:...

Title: huggingface/transformers: CTRL, DistilGPT-2, Pytorch TPU, tokenizer enhancements, guideline requirements

Type Software Thomas Wolf, Lysandre Debut, Victor SANH, Denis, erenup, Julien Chaumond, Matt, Grégory Châtel, Tim Rault, Catalin Voss, Fei Wang, Malte Pietsch, Davide Fiocco, Stefan Schweter, dhanajitb, Jinoo, Ananya Harsh Jha, yzy5630, Yongbo Wang, Shijie Wu, Guillem García Subies, Weixin Wang, Zeyao Du, Chi-Liang, Liu, Simon Layton, Nikolay Korolev, Joel Grus, Jade Abbott (2019): huggingface/transformers: CTRL, DistilGPT-2, Pytorch TPU, tokenizer enhancements, guideline requirements. Zenodo. Software. https://zenodo.org/record/3482923

Authors: Thomas Wolf (@huggingface) ; Lysandre Debut (Hugging Face) ; Victor SANH (@huggingface) ; Denis ; erenup ; Julien Chaumond (Hugging Face) ; Matt ; Grégory Châtel (DisAItek & Intel AI Innovators) ; Tim Rault (@huggingface) ; Catalin Voss (Stanford University) ; Fei Wang (@ShannonAI) ; Malte Pietsch (deepset) ; Davide Fiocco ; Stefan Schweter ; dhanajitb ; Jinoo ; Ananya Harsh Jha ; yzy5630 ; Yongbo Wang (Red Hat) ; Shijie Wu ; Guillem García Subies ; Weixin Wang ; Zeyao Du ; Chi-Liang, Liu (@ntu-spml-lab @Yoctol) ; Simon Layton (@NVIDIA) ; Nikolay Korolev (@JetBrains) ; Joel Grus (@allenai) ; Jade Abbott (@RetroRabbit) ;

Summary

New model architectures: CTRL, DistilGPT-2

Two new models have been added since release 2.0.

CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation, by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher. This model has been added to the library by @keskarnitish with the help of @thomwolf. DistilGPT-2 (from HuggingFace), as the second distilled model after DistilBERT in version 1.2.0. Released alongside the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Distillation

Several updates have been made to the distillation script, including the possibility to distill GPT-2 and to distill on the SQuAD task. By @VictorSanh.

Pytorch TPU support

The run_glue.py example script can now run on a Pytorch TPU.

Updates to example scripts

Several example scripts have been improved and refactored to use the full potential of the new tokenizer functions:

run_multiple_choice.py has been refactored to include encode_plus by @julien-c and @erenup run_lm_finetuning.py has been improved with the help of @dennymarcels, @jinoobaek-qz and @LysandreJik run_glue.py has been improved with the help of @brian41005 QOL enhancements on the tokenizer

Enhancements have been made on the tokenizers. Two new methods have been added: get_special_tokens_mask and truncate_sequences.

The former returns a mask indicating which tokens are special tokens in a token list, and which are tokens from the initial sequences. The latter truncate sequences according to a strategy.

Both of those methods are called by the encode_plus method, which itself is called by the encode method. The encode_plus now returns a larger dictionary which holds information about the special tokens, as well as the overflowing tokens.

Thanks to @julien-c, @thomwolf, and @LysandreJik for these additions.

Breaking changes

The two methods add_special_tokens_single_sequence and add_special_tokens_sequence_pair have been removed. They have been replaced by the single method build_inputs_with_special_tokens which has a more comprehensible name and manages both sequence singletons and pairs.

The boolean parameter truncate_first_sequence has been removed in tokenizers' encode and encode_plus methods, being replaced by a strategy in the form of a string: 'longest_first', 'only_second', 'only_first' or 'do_not_truncate' are accepted strategies.

When the encode or encode_plus methods are called with a specified max_length, the sequences will now always be truncated or throw an error if overflowing.

Guidelines and requirements

New contributing guidelines have been added, alongside library development requirements by @rlouf, the newest member of the HuggingFace team.

Community additions/bug-fixes/improvements GLUE Processors have been refactored to handle inputs for all tasks coming from the tensorflow_datasets. This work has been done by @agrinh and @philipp-eisen. The padding_idx is now correctly initialized to 1 in randomly initialized RoBERTa models. @ikuyamada The documentation CSS has been adapted to work on older browsers. @TimYagan An addition concerning the management of hidden states has been added to the README by @BramVanroy. Integration of TF 2.0 models with other Keras modules @thomwolf Past values can be opted-out @thomwolf

More information

DOI: 10.5281/zenodo.3482923

Dates

Publication date: 2019
Issued: October 11, 2019

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v2.1.1
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: huggingface/transformers: CTRL, DistilGPT-2, Pytorch TPU, tokenizer enhancements, guideline requirements

Links

Summary

More information

Dates

Rights

Format

Relateditems