huggingface/transformers:...

Title: huggingface/transformers: v2.0.0 - TF 2.0/PyTorch interoperability, improved tokenizers, improved torchscript support

Type Software Thomas Wolf, Lysandre Debut, Victor SANH, Denis, erenup, Matt, Grégory Châtel, Julien Chaumond, Tim Rault, Catalin Voss, Fei Wang, Malte Pietsch, Davide Fiocco, dhanajitb, Stefan Schweter, Ananya Harsh Jha, yzy5630, Yongbo Wang, Shijie Wu, Guillem García Subies, Weixin Wang, Zeyao Du, Chi-Liang, Liu, Nikolay Korolev, Joel Grus, Jade Abbott, David Pollack, Clement, Ailing, Abhishek Rao (2019): huggingface/transformers: v2.0.0 - TF 2.0/PyTorch interoperability, improved tokenizers, improved torchscript support. Zenodo. Software. https://zenodo.org/record/3462038

Authors: Thomas Wolf (@huggingface) ; Lysandre Debut (Hugging Face) ; Victor SANH (@huggingface) ; Denis ; erenup ; Matt ; Grégory Châtel (DisAItek & Intel AI Innovators) ; Julien Chaumond (Hugging Face) ; Tim Rault (@huggingface) ; Catalin Voss (Stanford University) ; Fei Wang (@ShannonAI) ; Malte Pietsch (deepset) ; Davide Fiocco ; dhanajitb ; Stefan Schweter ; Ananya Harsh Jha ; yzy5630 ; Yongbo Wang (Red Hat) ; Shijie Wu ; Guillem García Subies ; Weixin Wang ; Zeyao Du ; Chi-Liang, Liu (@ntu-spml-lab @Yoctol) ; Nikolay Korolev (@JetBrains) ; Joel Grus (@allenai) ; Jade Abbott (@RetroRabbit) ; David Pollack (i2x) ; Clement (@huggingface) ; Ailing ; Abhishek Rao (@microsoft) ;

Summary

Name change: welcome 🤗 Transformers

Following the extension to TensorFlow 2.0, pytorch-transformers => transformers

Install with pip install transformers

TensorFlow 2.0 - PyTorch

All the PyTorch nn.Module classes now have their counterpart in TensorFlow 2.0 as tf.keras.Model classes. TensorFlow 2.0 classes have the same name as their PyTorch counterparts prefixed with TF.

The interoperability between TensorFlow and PyTorch is actually a lot deeper than what is usually meant when talking about libraries with multiple backends:

each model (not just the static computation graph) can be seamlessly moved from one framework to the other during the lifetime of the model for training/evaluation/usage (from_pretrained can load weights saved from models saved in one or the other framework), an example is given in the quick-tour on TF 2.0 and PyTorch in the readme in which a model is trained using keras.fit before being opened in PyTorch for quick debugging/inspection. Remaining unsupported operations in TF 2.0 (to be added later): resizing input embeddings to add new tokens pruning model heads TPU support

Training on TPU using free TPUs provided in the TensorFlow Research Cloud (TFRC) program is possible but requires to implement a custom training loop (not possible with keras.fit at the moment). We will add an example of such a custom training loop soon.

Improved tokenizers

Tokenizers have been improved to provide extended encoding methods encoding_plus and additional arguments to encoding. Please refer to the doc for detailed usage of the new options.

Potential breaking change: positional order of some model keywords inputs changed

To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models keywords inputs (attention_mask, token_type_ids...) has been changed.

If you used to call the models with keyword names for keyword arguments, e.g. model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids), this should not cause any breaking change.

If you used to call the models with positional inputs for keyword arguments, e.g. model(inputs_ids, attention_mask, token_type_ids), you should double-check the exact order of input arguments.

Community additions/bug-fixes/improvements new German model (@Timoeller) new script for MultipleChoice training (SWAG, RocStories...) (@erenup) better fp16 support (@ziliwang and @bryant1410) fix evaluation in run_lm_finetuning (@SKRohit) fiw LM finetuning to prevent crashing on assert len(tokens_b)>=1 (@searchivarius) Various doc and docstring fixes (@sshleifer, @Maxpa1n, @mattolson93, @t080)

More information

DOI: 10.5281/zenodo.3462038

Dates

Publication date: 2019
Issued: September 26, 2019

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v2.0.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: huggingface/transformers: v2.0.0 - TF 2.0/PyTorch interoperability, improved tokenizers, improved torchscript support

Links

Summary

More information

Dates

Rights

Format

Relateditems