Transformers: State-of-th...

Title: Transformers: State-of-the-Art Natural Language Processing

Type Software Wolf, Thomas, Debut, Lysandre, Sanh, Victor, Chaumond, Julien, Delangue, Clement, Moi, Anthony, Cistac, Perric, Ma, Clara, Jernite, Yacine, Plu, Julien, Xu, Canwen, Le Scao, Teven, Gugger, Sylvain, Drame, Mariama, Lhoest, Quentin, Rush, Alexander M. (2020): Transformers: State-of-the-Art Natural Language Processing. Zenodo. Software. https://zenodo.org/record/6913778

Authors: Wolf, Thomas ; Debut, Lysandre ; Sanh, Victor ; Chaumond, Julien ; Delangue, Clement ; Moi, Anthony ; Cistac, Perric ; Ma, Clara ; Jernite, Yacine ; Plu, Julien ; Xu, Canwen ; Le Scao, Teven ; Gugger, Sylvain ; Drame, Mariama ; Lhoest, Quentin ; Rush, Alexander M. ;

Summary

TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.

import tensorflow as tf from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("t5-small") model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small") # Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of` xla_generate = tf.function(model.generate, jit_compile=True) tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"} # The first prompt will be slow (compiling), the others will be very fast! input_prompts = [ f"translate English to {language}: I have four cats and three dogs." for language in ["German", "French", "Romanian"] ] for input_prompt in input_prompts: tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs) generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32) print(tokenizer.decode(generated_text[0], skip_special_tokens=True)) Generate: deprecate default max_length by @gante in #18018 TF: GPT-J compatible with XLA generation by @gante in #17986 TF: T5 can now handle a padded past (i.e. XLA generation) by @gante in #17969 TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by @gante in #17857 TF: generate without tf.TensorArray by @gante in #17801 TF: BART compatible with XLA generation by @gante in #17479 New model additions OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

Add OWL-ViT model for zero-shot object detection by @alaradirik in #17938 Fix OwlViT tests by @sgugger in #18253 NLLB

The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

[M2M100] update conversion script by @patil-suraj in #17916 NLLB tokenizer by @LysandreJik in #18126 MobileViT

The MobileViT model was proposed in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

add MobileViT model by @hollance in #17354 Nezha

The Nezha model was proposed in NEZHA: Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

Nezha Pytorch implementation by @sijunhe in #17776 GroupViT

The GroupViT model was proposed in GroupViT: Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by CLIP, GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

Adding GroupViT Models by @xvjiarui in #17313 MVP

The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

Add MVP model by @StevenTang1998 in #17787 CodeGen

The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython.

Add CodeGen model by @rooa in #17443 [CodeGen] support device_map="auto" for sharded checkpoints by @patil-suraj in #17871 UL2

The UL2 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

Add UL2 (just docs) by @patrickvonplaten in #17740 Custom pipelines

This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add trust_remote_code=True when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the added documentation.

Custom pipeline by @sgugger in #18079 PyTorch to TensorFlow CLI utility

This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

CLI: tool to convert PT into TF weights and open hub PR by @gante in https://github.com/huggingface/transformers/pull/17497 TensorFlow-specific improvements

The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.

[SegFormer] TensorFlow port by @sayakpaul in #17910 Add TF DeiT implementation by @amyeroberts in #17806 Add TF ResNet model by @amyeroberts in #17427 TF implementation of RegNets by @ariG23498 in #17554

Additionally, our TF models now support loading sharded checkpoints:

TF Sharded by @ArthurZucker in #17713 Flax-specific improvements

The following models have been ported to be used in JAX:

Flax t5 Encoder by @crystina-z in #17784

Additionally, our JAX models now support loading sharded checkpoints:

Flax sharded by @ArthurZucker in #17760 Additional model heads

The following models now have a brand new head for new tasks:

Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) by @gilad19 in #17924 Adding OPTForSeqClassification class by @oneraghavan in #18123 ONNX support

A continued community effort provides ONNX converters for an increasing number of models.

add ONNX support for LeVit by @gcheron in #18154 add ONNX support for BLOOM by @NouamaneTazi in #17961 Add ONNX support for LayoutLMv3 by @regisss in #17953 Mrbean/codegen onnx by @sam-h-bean in #17903 Add ONNX support for DETR by @regisss in #17904 add onnx support for deberta and debertav2 by @sam-h-bean in #17617 Documentation translation

A community effort aiming to translate the documentation in several languages has been continued.

Portuguese Added translation of index.mdx to Portuguese Issue #16824 by @rzimmerdev in #17565 Spanish Add Spanish translation of custom_models.mdx by @donelianc in #17807 Italian Add Italian translation of sharing_custom_models.mdx by @Xpiri in #17631 Add Italian translation of converting_tensorflow_models.mdx by @Xpiri in #18283 Add Italian translation of create_model.mdx and serialization.mdx by @F02934 in #17640 Italian/accelerate by @mfumanelli in #17698 Italian/model sharing by @mfumanelli in #17828 Italian translation of run_scripts.mdx gh-17459 by @lorenzobalzani in #17642 Translation/debugging by @nickprock in #18230 Translation/training: italian translation training.mdx by @nickprock in #17662 Translation italian: multilingual.mdx by @nickprock in #17768 Added preprocessing.mdx italian translation by @nickprock in #17600 Improvements and bugfixes [EncoderDecoder] Improve docs by @NielsRogge in #18271 [DETR] Improve code examples by @NielsRogge in #18262 patch for smddp import by @carolynwang in #18244 Fix Sylvain's nits on the original KerasMetricCallback PR by @Rocketknight1 in #18300 Add PYTEST_TIMEOUT for CircleCI test jobs by @ydshieh in #18251 Add PyTorch 1.11 to past CI by @ydshieh in #18302 Raise a TF-specific error when importing Torch classes by @Rocketknight1 in #18280 [ create_a_model.mdx ] translate to pt by @Fellip15 in #18098 Update translation.mdx by @gorkemozkaya in #18169 Add TFAutoModelForImageClassification to pipelines.py by @ydshieh in #18292 Adding type hints of TF:OpenAIGPT by @Mathews-Tom in #18263 Adding type hints of TF:CTRL by @Mathews-Tom in #18264 Replace false parameter by a buffer by @sgugger in #18259 Fix ORTTrainer failure on gpt2 fp16 training by @JingyaHuang in #18017 Owlvit docs test by @alaradirik in #18257 Good difficult issue override for the stalebot by @LysandreJik in #18094 Fix dtype of input_features in docstring by @ydshieh in #18258 Fix command of doc tests for local testing by @oneraghavan in #18236 Fix TF bad words filter with XLA by @Rocketknight1 in #18286 Allows KerasMetricCallback to use XLA generation by @Rocketknight1 in #18265 Skip passes report for --make-reports by @ydshieh in #18250 Update serving code to enable saved_model=True by @amyeroberts in #18153 Change how take_along_axis is computed in DeBERTa to stop confusing XLA by @Rocketknight1 in #18256 Fix torch version check in Vilt by @ydshieh in #18260 change bloom parameters to 176B by @muhammad-ahmed-ghani in #18235 TF: use the correct config with (...)EncoderDecoder models by @gante in #18097 Fix no_trainer CI by @muellerzr in #18242 Update notification service by @ydshieh in #17921 Make errors for loss-less models more user-friendly by @sgugger in #18233 Fix TrainingArguments help section by @sgugger in #18232 Better messaging and fix for incorrect shape when collating data. by @CakeCrusher in #18119 Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API by @viclzhu in #18221 Update add_new_pipeline.mdx by @zh-zheng in #18224 Add custom config to quicktour by @stevhliu in #18115 skip some test_multi_gpu_data_parallel_forward by @ydshieh in #18188 Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES by @ydshieh in #18213 Fix LayoutXLM docstrings by @qqaatw in #17038 update cache to v0.5 by @ydshieh in #18203 Reduce console spam when using the KerasMetricCallback by @Rocketknight1 in #18202 TF: Add missing cast to GPT-J by @gante in #18201 Use next-gen CircleCI convenience images by @ydshieh in #18197 Typo in readme by @flozi00 in #18195 [From pretrained] Allow download from subfolder inside model repo by @patrickvonplaten in #18184 Update docs README with instructions on locally previewing docs by @snehankekre in #18196 bugfix: div-->dim by @orgoro in #18135 Add vision example to README by @sgugger in #18194 Remove use_auth_token from the from_config method by @duongna21 in #18192 FSDP integration enhancements and fixes by @pacman100 in #18134 BLOOM minor fixes small test by @younesbelkada in #18175 fix typo inside bloom documentation by @SaulLu in #18187 Better default for offload_state_dict in from_pretrained by @sgugger in #18183 Fix template for new models in README by @sgugger in #18182 FIX: Typo by @ayansengupta17 in #18156 Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests by @ydshieh in #18073 Fix expected loss values in some (m)T5 tests by @ydshieh in #18177 [HPO] update to sigopt new experiment api by @sywangyi in #18147 Fix incorrect type hint for lang by @JohnGiorgi in #18161 Fix check for falsey inputs in run_summarization by @JohnGiorgi in #18155 Adding support for device_map directly in pipeline(..) function. by @Narsil in #17902 Fixing a hard to trigger bug for text-generation pipeline. by @Narsil in #18131 Enable torchdynamo with torch_tensorrt(fx path) by @frank-wei in #17765 Make sharded checkpoints work in offline mode by @sgugger in #18125 add dataset split and config to model-index in TrainingSummary.from_trainer by @loicmagne in #18064 Add summarization name mapping for MultiNews by @JohnGiorgi in #18117 supported python versions reference by @CakeCrusher in #18116 TF: unpack_inputs decorator independent from main_input_name by @gante in #18110 TF: remove graph mode distinction when processing boolean options by @gante in #18102 Fix BLOOM dtype by @Muennighoff in #17995 CLI: reenable pt_to_tf test by @gante in #18108 Report value for a step instead of epoch. by @zhawe01 in #18095 speed up test by @sijunhe in #18106 Enhance IPEX integration in Trainer by @jianan-gu in #18072 Bloom Optimize operations by @younesbelkada in #17866 Add filename to info diaplyed when downloading things in from_pretrained by @sgugger in #18099 Fix image segmentation and object detection pipeline tests by @sgugger in #18100 Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts by @duongna21 in #18069 Fix torchscript tests for GPT-NeoX by @ydshieh in #18012 Fix some typos. by @Yulv-git in #17560 [bloom] fix alibi device placement by @stas00 in #18087 Make predict() close progress bars after finishing by @neverix in #17952) Update localized READMES when template is filled. by @sgugger in #18062 Fix type issue in using bucketing with Trainer by @seopbo in #18051 Fix slow CI by pinning resampy by @sgugger in #18077 Drop columns after loading samples in prepare_tf_dataset by @Rocketknight1 in #17967 [Generate Tests] Make sure no tokens are force-generated by @patrickvonplaten in #18053 Added Command for windows VENV activation in installation docs by @darthvader2 in #18008 Sort doc toc by @sgugger in #18034 Place inputs on device when include_inputs_for_metrics is True by @sgugger in #18046 Doc to dataset by @sgugger in #18037 Protect TFGenerationMixin.seed_generator so it's not created at import by @Rocketknight1 in #18044 Fix T5 incorrect weight decay in Trainer and official summarization example by @ADAning in #18002 Squash commits by @NielsRogge in #17981 Enable Past CI by @ydshieh in #17919 Fix T5/mT5 tests by @Rocketknight1 in #18029 [Flax] Bump to v0.4.1 by @sanchit-gandhi in #17966 Update expected values in DecisionTransformerModelIntegrationTest by @ydshieh in #18016 fixed calculation of ctc loss in TFWav2Vec2ForCTC by @Sreyan88 in #18014 Return scalar losses instead of per-sample means by @Rocketknight1 in #18013 sort list of models by @hollance in #18011 Replace BloomTokenizer by BloomTokenizerFast in doc by @regisss in #18005 Fix typo in error message in generation_utils by @regisss in #18000 Refactor to inherit from nn.Module instead of nn.ModuleList by @amyeroberts in #17501 Add link to existing documentation by @LysandreJik in #17931 only a stupid typo, but it can lead to confusion by @Dobatymo in #17930 Exclude Databricks from notebook env only if the runtime is below 11.0 by @davidheryanto in #17988 Shifting labels for causal LM when using label smoother by @seungeunrho in #17987 Restore original task in test_warning_logs by @ydshieh in #17985 Ensure PT model is in evaluation mode and lightweight forward pass done by @amyeroberts in #17970 XLA train step fixes by @Rocketknight1 in #17973 [Flax] Add remat (gradient checkpointing) by @sanchit-gandhi in #17843 higher atol to avoid flaky trainer test failure by @ydshieh in #17979 Fix FlaxBigBirdEmbeddings by @ydshieh in #17842 fixin

More information

DOI: 10.5281/zenodo.6913778

Dates

Publication date: 2020
Issued: October 01, 2020

Notes

Other: If you use this software, please cite it using these metadata.

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v4.21.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: Transformers: State-of-the-Art Natural Language Processing

Links

Summary

More information

Dates

Notes

Rights

Format

Relateditems