Transformers: State-of-th...

Title: Transformers: State-of-the-Art Natural Language Processing

Type Software Wolf, Thomas, Debut, Lysandre, Sanh, Victor, Chaumond, Julien, Delangue, Clement, Moi, Anthony, Cistac, Perric, Ma, Clara, Jernite, Yacine, Plu, Julien, Xu, Canwen, Le Scao, Teven, Gugger, Sylvain, Drame, Mariama, Lhoest, Quentin, Rush, Alexander M. (2020): Transformers: State-of-the-Art Natural Language Processing. Zenodo. Software. https://zenodo.org/record/6543388

Authors: Wolf, Thomas ; Debut, Lysandre ; Sanh, Victor ; Chaumond, Julien ; Delangue, Clement ; Moi, Anthony ; Cistac, Perric ; Ma, Clara ; Jernite, Yacine ; Plu, Julien ; Xu, Canwen ; Le Scao, Teven ; Gugger, Sylvain ; Drame, Mariama ; Lhoest, Quentin ; Rush, Alexander M. ;

Summary

Disclaimer: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

Add OPT by @younesbelkada in #17088 FLAVA

The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

[feat] Add FLAVA model by @apsdehal in #16654 YOLOS

The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

Add YOLOS by @NielsRogge in #16848 RegNet

The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

RegNet by @FrancescoSaverioZuppichini in #16188 TAPEX

The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

Add TAPEX by @NielsRogge in #16473 Data2Vec: vision

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

[Data2Vec] Add data2vec vision by @patrickvonplaten in #16760 Add Data2Vec for Vision in TF by @sayakpaul in #17008 FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed. PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

PyTorch FSDP integration in Trainer by @pacman100 in #17136 Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

Add image classification script, no trainer by @NielsRogge in #16727 Add semantic script no trainer, v2 by @NielsRogge in #16788 Add semantic script, trainer by @NielsRogge in #16834 Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

Added es version of language_modeling.mdx doc by @jQuinRivero in #17021 Spanish translation of the file philosophy.mdx by @jkmg in #16922 Documentation: Spanish translation of fast_tokenizers.mdx by @jloayza10 in #16882 Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by @omarespejel in #16685 Spanish translation of the file multilingual.mdx by @SimplyJuanjo in #16329 Improvements and bugfixes [modeling_utils] rearrange text by @stas00 in #16632 Added Annotations for PyTorch models by @anmolsjoshi in #16619 Allow the same config in the auto mapping by @sgugger in #16631 Update no_trainer scripts with new Accelerate functionalities by @muellerzr in #16617 Fix doc example by @NielsRogge in #16448 Add inputs vector to calculate metric method by @lmvasque in #16461 [megatron-bert-uncased-345m] fix conversion by @stas00 in #16639 Remove parent/child tests in auto model tests by @sgugger in #16653 Updated _load_pretrained_model_low_mem to check if keys are in the state_dict by @FrancescoSaverioZuppichini in #16643 Update Support image on README.md by @BritneyMuller in #16615 bert: properly mention deprecation of TF2 conversion script by @stefan-it in #16171 add vit tf doctest with @add_code_sample_docstrings by @johko in #16636 Fix error in doc of DataCollatorWithPadding by @secsilm in #16662 Fix QA sample by @ydshieh in #16648 TF generate refactor - Beam Search by @gante in #16374 Add tests for no_trainer and fix existing examples by @muellerzr in #16656 only load state dict when the checkpoint is not None by @laurahanu in #16673 [Trainer] tf32 arg doc by @stas00 in #16674 Update audio examples with MInDS-14 by @stevhliu in #16633 add a warning in SpmConverter for sentencepiece's model using the byte fallback feature by @SaulLu in #16629 Fix some doc examples in task summary by @ydshieh in #16666 Jia multi gpu eval by @liyongsea in #16428 Generate: min length can't be larger than max length by @gante in #16668 fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist by @sadransh in #16686 [Doctests] Correct task summary by @patrickvonplaten in #16644 Add Doc Test for BERT by @vumichien in #16523 Fix t5 shard on TPU Pods by @agemagician in #16527 update decoder_vocab_size when resizing embeds by @patil-suraj in #16700 Fix TF_MASKED_LM_SAMPLE by @ydshieh in #16698 Rename the method test_torchscript by @ydshieh in #16693 Reduce memory leak in _create_and_check_torchscript by @ydshieh in #16691 Enable more test_torchscript by @ydshieh in #16679 Don't push checkpoints to hub in no_trainer scripts by @muellerzr in #16703 Private repo TrainingArgument by @nbroad1881 in #16707 Handle image_embeds in ViltModel by @ydshieh in #16696 Improve PT/TF equivalence test by @ydshieh in #16557 Fix example logs repeating themselves by @muellerzr in #16669 [Bart] correct doc test by @patrickvonplaten in #16722 Add Doc Test GPT-2 by @ArEnSc in #16439 Only call get_output_embeddings when tie_word_embeddings is set by @smelm in #16667 Update run_translation_no_trainer.py by @raki-1203 in #16652 Qdqbert example add benchmark script with ORT-TRT by @shangz-ai in #16592 Replace assertion with exception by @anmolsjoshi in #16720 Change the chunk_iter function to handle by @Narsil in #16730 Remove duplicate header by @sgugger in #16732 Moved functions to pytorch_utils.py by @anmolsjoshi in #16625 TF: remove set_tensor_by_indices_to_value by @gante in #16729 Add Doc Tests for Reformer PyTorch by @hiromu166 in #16565 [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init by @sanchit-gandhi in #16728 [FlaxWav2Vec2Model] Fix bug in attention mask by @sanchit-gandhi in #16725 add Bigbird ONNX config by @vumichien in #16427 TF generate: handle case without cache in beam search by @gante in #16704 Fix decoding score comparison when using logits processors or warpers by @bryant1410 in #10638 [Doctests] Fix all T5 doc tests by @patrickvonplaten in #16646 Fix #16660 (tokenizers setters of ids of special tokens) by @davidleonfdez in #16661 [from_pretrained] refactor find_mismatched_keys by @stas00 in #16706 Add Doc Test for GPT-J by @ArEnSc in #16507 Fix and improve CTRL doctests by @jeremyadamsfisher in #16573 [modeling_utils] better explanation of ignore keys by @stas00 in #16741 CI: setup-dependent pip cache by @gante in #16751 Reduce Funnel PT/TF diff by @ydshieh in #16744 Add defensive check for config num_labels and id2label by @sgugger in #16709 Add self training code for text classification by @tuvuumass in #16738 [self-scheduled ci] explain where dependencies are by @stas00 in #16757 Fixup no_trainer examples scripts and add more tests by @muellerzr in #16765 [Doctest] added doctest changes for electra by @bhadreshpsavani in #16675 Enabling Tapex in table question answering pipeline. by @Narsil in #16663 [Flax .from_pretrained] Raise a warning if model weights are not in float32 by @sanchit-gandhi in #16762 Fix batch size in evaluation loop by @sgugger in #16763 Make nightly install dev accelerate by @muellerzr in #16783 [deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop by @stas00 in #16717 Kill async pushes when calling push_to_hub with blocking=True by @sgugger in #16755 Improve image classification example by @NielsRogge in #16585 [SpeechEncoderDecoderModel] Fix bug in reshaping labels by @sanchit-gandhi in #16748 Fix issue avoid-missing-comma found at https://codereview.doctor by @code-review-doctor in #16768 [trainer / deepspeed] fix hyperparameter_search by @stas00 in #16740 [modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests by @stas00 in #16657 Fix PT TF ViTMAE by @ydshieh in #16766 Update README.md by @NielsRogge in #16797 Pin Jax to last working release by @sgugger in #16808 CI: non-remote GH Actions now use a python venv by @gante in #16789 TF generate refactor - XLA sample by @gante in #16713 Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed by @allanj in #16786 Create empty venv on cache miss by @gante in #16816 [ViT, BEiT, DeiT, DPT] Improve code by @NielsRogge in #16799 [Quicktour Audio] Improve && remove ffmpeg dependency by @patrickvonplaten in #16723 fix megatron bert convert state dict naming by @Codle in #15820 use base_version to check torch version in torch_less_than_1_11 by @nbroad1881 in #16806 Allow passing encoder_ouputs as tuple to EncoderDecoder Models by @jsnfly in #16814 Refactor issues with yaml by @LysandreJik in #16772 fix _setup_devices in case where there is no torch.distributed package in build by @dlwh in #16821 Clean up semantic segmentation tests by @NielsRogge in #16801 Fix LayoutLMv2 tokenization docstrings by @qqaatw in #16187 Wav2 vec2 phoneme ctc tokenizer optimisation by @ArthurZucker in #16817 [Flax] improve large model init and loading by @patil-suraj in #16148 Some tests misusing assertTrue for comparisons fix by @code-review-doctor in #16771 Type hints added for TFMobileBert by @Dahlbomii in #16505 fix rum_clm.py seeking text column name twice by @dandelin in #16624 Add onnx export of models with a multiple choice classification head by @echarlaix in #16758 [ASR Pipeline] Correct init docs by @patrickvonplaten in #16833 Add doc about attention_mask on gpt2 by @wiio12 in #16829 TF: Add sigmoid activation function by @gante in #16819 Correct Logging of Eval metric to Tensorboard by @Jeevesh8 in #16825 replace Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings by @SaulLu in #16835 Type hints added to Speech to Text by @Dahlbomii in #16506 Improve test_pt_tf_model_equivalence on PT side by @ydshieh in #16731 Add support for bitsandbytes by @manuelciosici in #15622 [Typo] Fix typo in modeling utils by @patrickvonplaten in #16840 add DebertaV2 fast tokenizer by @mingboiz in #15529 Fixing return type tensor with num_return_sequences>1. by @Narsil in #16828 [modeling_utils] use less cpu memory with sharded checkpoint loading by @stas00 in #16844 [docs] fix url by @stas00 in #16860 Fix custom init sorting script by @sgugger in #16864 Fix multiproc metrics in no_trainer examples by @muellerzr in #16865 Long QuestionAnsweringPipeline fix. by @Narsil in #16778 t5: add conversion script for T5X to FLAX by @stefan-it in #16853 tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars by @ghlai9665 in #15901 Adding support for array key in raw dictionnaries in ASR pipeline. by @Narsil in #16827 Return input_ids in ImageGPT feature extractor by @sgugger in #16872 Use ACT2FN to fetch ReLU activation by @eldarkurtic in #16874 Fix GPT-J onnx conversion by @ChainYo in #16780 Fix doctest list by @ydshieh in #16878 New features for CodeParrot training script by @loubnabnl in #16851 Add missing entries in mappings by @ydshieh in #16857 TF: rework XLA generate tests by @gante in #16866 Minor fixes/improvements in convert_file_size_to_int by @mariosasko in #16891 Add doc tests for Albert and Bigbird by @vumichien in #16774 Add OnnxConfig for ConvBERT by @ChainYo in #16859 TF: XLA repetition penalty by @gante in #16879 Changes in create_optimizer to support tensor parallelism with SMP by @cavdard in #16880 [DocTests] Fix some doc tests by @patrickvonplaten in #16889 add bigbird typo fixes by @ChainYo in #16897 Fix doc test quicktour dataset by @patrickvonplaten in #16929 Add missing ckpt in config docs by @ydshieh in #16900 Fix PyTorch RAG tests GPU OOM by @ydshieh in #16881 Fix RemBertTokenizerFast by @ydshieh in #16933 TF: XLA logits processors - minimum length, forced eos, and forced bos by @gante in #16912 TF: XLA Logits Warpers by @gante in #16899 added deit onnx config by @rushic24 in #16887 TF: XLA stable softmax by @gante in #16892 Replace deprecated logger.warn with warning by @sanchit-gandhi in #16876 Fix issue probably-meant-fstring found at https://codereview.doctor by @code-review-doctor in #16913 Limit the use of PreTrainedModel.device by @sgugger in #16935 apply torch int div to layoutlmv2 by @ManuelFay in #15457 FIx Iterations for decoder by @agemagician in #16934 Add onnx config for RoFormer by @skrsna in #16861 documentation: some minor clean up by @mingboiz in #16850 Fix RuntimeError message format by @ftnext in #16906 use original loaded keys to find mismatched keys by @tricktreat in #16920 [Research] Speed up evaluation for XTREME-S by @anton-l in #16785 Fix HubertRobustTest PT/TF equivalence test on GPU by @ydshieh in #16943 Misc. fixes for Pytorch QA examples: by @searchivarius in #16958 [HF Argparser] Fix parsing of optional boolean arguments by @NielsRogge in #16946 Fix distributed_concat with scalar tensor by @Yard1 in #16963 Update custom_models.mdx by @mishig25 in #16964 Fix add-new-model-like when model doesn't support all frameworks by @sgugger in #16966 Fix multiple deletions of the same files in save_pretrained by @sgugger in #16947 Fixup no_trainer save logic by @muellerzr in #16968 Fix doc notebooks links by @sgugger in #16969 Fix check_all_models_are_tested by @ydshieh in #16970 Add -e flag to some GH workflow yml files by @ydshieh in #16959 Update tokenization_bertweet.py by @datquocnguyen in #16941 Update check_models_are_tested to deal with Windows path by @ydshieh in #16973 Add parameter --config_overrides for run_mlm_wwm.py by @conan1024hao in #16961 Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx by @amyeroberts in #16993 set eos_token_id to None to generate until max length by @ydshieh in #16989 Fix savedir for by epoch by @muellerzr in #16996 Update README to latest release by @sgugger in #16997 use scale=1.0 in floats_tensor called in speech model testers by @ydshieh in #17007 Update all require decorators to use skipUnless when possible by @muellerzr in #16999 TF: XLA bad words logits processor and list of processors by @gante in #16974 Make create_extended_attention_mask_for_decoder static method by @pbelevich in #16893 Update README_zh-hans.md by @tarzanwill in #16977 Updating variable names. by @Narsil in #16445 Revert "Updating variable names. by @Narsil in #16445)" Replace dict/BatchEncoding instance checks by Mapping by @sgugger in #17014 Result of new doc style with fixes by @sgugger in #17015 Add a check on config classes docstring checkpoints by @ydshieh in #17012 Add translating guide by @omarespejel in #17004 update docs of length_penalty by @manandey in #17022 [FlaxGenerate] Fix bug in decoder_start_token_id by @sanchit-gandhi in #17035 Fx with meta by @michaelbenayoun in #16836 [Flax(Speech)EncoderDecoder] Fix bug in decoder_module by @sanchit-gandhi in #17036 Fix typo in RetriBERT docstring by @mpoemsl in #17018 add torch.no_grad when in eval mode by @JunnYu in #17020 Disable Flax GPU tests on push by @sgugger in #17042 Clean up vision tests by @NielsRogge in #17024 [Trainer] Move logic for checkpoint loading into separate methods for easy overriding by @calpt in #17043 Update no_trainer examples to use new logger by @muellerzr in #17044 Fix no_trainer examples to properly calculate the number of sampl

More information

DOI: 10.5281/zenodo.6543388

Dates

Publication date: 2020
Issued: October 01, 2020

Notes

Other: If you use this software, please cite it using these metadata.

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v4.19.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: Transformers: State-of-the-Art Natural Language Processing

Links

Summary

More information

Dates

Notes

Rights

Format

Relateditems