Transformers: State-of-th...

Title: Transformers: State-of-the-Art Natural Language Processing

Type Software Wolf, Thomas, Debut, Lysandre, Sanh, Victor, Chaumond, Julien, Delangue, Clement, Moi, Anthony, Cistac, Perric, Ma, Clara, Jernite, Yacine, Plu, Julien, Xu, Canwen, Le Scao, Teven, Gugger, Sylvain, Drame, Mariama, Lhoest, Quentin, Rush, Alexander M. (2020): Transformers: State-of-the-Art Natural Language Processing. Zenodo. Software. https://zenodo.org/record/7183183

Authors: Wolf, Thomas ; Debut, Lysandre ; Sanh, Victor ; Chaumond, Julien ; Delangue, Clement ; Moi, Anthony ; Cistac, Perric ; Ma, Clara ; Jernite, Yacine ; Plu, Julien ; Xu, Canwen ; Le Scao, Teven ; Gugger, Sylvain ; Drame, Mariama ; Lhoest, Quentin ; Rush, Alexander M. ;

Summary

Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

The abstract from the paper is the following:

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zeroshot transfer setting without the need for any finetuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

Add WhisperModel to transformers by @ArthurZucker in #19166 Add TF whisper by @amyeroberts in #19378 Time series

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

time series forecasting model by @kashif in #17965 Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

The abstract from the paper is the following:

The recently-developed DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings for localizing the four extremities and predicting the box, which increases the need for high-quality content embeddings and thus the training difficulty. Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention. The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box. This narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training. Empirical results show that conditional DETR converges 6.7× faster for the backbones R50 and R101 and 10× faster for stronger backbones DC5-R50 and DC5-R101.

Add support for conditional detr by @DeppMeng in #18948 Improve conditional detr docs by @NielsRogge in #19154 Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. The paper presents a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, their method yields excellent performance in the low-shot and extreme low-shot regimes.

The abstract from the paper is the following:

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark.

MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815 MarkupLM

The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks:

WebSRC, a dataset for Web-Based Structual Reading Comprehension (a bit like SQuAD but for web pages) SWDE, a dataset for information extraction from web pages (basically named-entity recogntion on web pages) The abstract from the paper is the following:

Multimodal pre-training with text, layout, and image has made significant progress for Visually-rich Document Understanding (VrDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available.

Add MarkupLM by @NielsRogge in #19198 Security & safety

We explore a new serialization format that we can leverage in all three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the safetensors library for that.

Support for this is still experimental.

Poc to use safetensors by @sgugger in #19175 Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs. :warning: The existing methods that are superseded by the introduced methods post_process_object_detection, post_process_semantic_segmentation, post_process_instance_segmentation, post_process_panoptic_segmentation are now deprecated.

Improve DETR post-processing methods by @alaradirik in #19205 Beit postprocessing by @alaradirik in #19099 Fix BeitFeatureExtractor postprocessing by @alaradirik in #19119 Add post_process_semantic_segmentation method to SegFormer by @alaradirik in #19072 Add post_process_semantic_segmentation method to DPTFeatureExtractor by @alaradirik in #19107 Add semantic segmentation post-processing method to MobileViT by @alaradirik in #19105 Detr preprocessor fix by @alaradirik in #19007 Improve and fix ImageSegmentationPipeline by @alaradirik in #19367 Restructure DETR post-processing, return prediction scores by @alaradirik in #19262 Maskformer post-processing fixes and improvements by @alaradirik in #19172 Fix MaskFormer failing postprocess tests by @alaradirik in #19354 Fix DETR segmentation postprocessing output by @alaradirik in #19363 fix docs example, add object_detection to DETR docs by @alaradirik in #19377 🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

🚨🚨🚨 Fix ViT parameter initialization by @alaradirik in #19341

Breaking change for the top_p argument of the TopPLogitsWarper of the generate method.

🚨🚨🚨 Optimize Top P Sampler and fix edge case by @ekagra-ranjan in #18984 Model head additions

OPT and BLOOM now have question answering heads available.

Add OPTForQuestionAnswering by @clementapa in #19402 Add BloomForQuestionAnswering by @younesbelkada in #19310 Pipelines

There is now a zero-shot object detection pipeline.

Add ZeroShotObjectDetectionPipeline by @sahamrit in #18445) TensorFlow architectures

The GroupViT model is now available in TensorFlow.

[TensorFlow] Adding GroupViT by @ariG23498 in #18020 Bugfixes and improvements Fix a broken link for deepspeed ZeRO inference in the docs by @nijkah in #19001 [doc] debug: fix import by @stas00 in #19042 [bnb] Small improvements on utils by @younesbelkada in #18646 Update image segmentation pipeline test by @amyeroberts in #18731 Fix test_save_load for TFViTMAEModelTest by @ydshieh in #19040 Pin minimum PyTorch version for BLOOM ONNX export by @lewtun in #19046 Update serving signatures and make sure we actually use them by @Rocketknight1 in #19034 Move cache: expand error message by @sgugger in #19051 Fixing OPT fast tokenizer option. by @Narsil in #18753 Fix custom tokenizers test by @sgugger in #19052 Run torchdynamo tests by @ydshieh in #19056 [fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140 fix arg name in BLOOM testing and remove unused arg document by @shijie-wu in #18843 Adds package and requirement spec output to version check exception by @colindean in #18702 fix use_cache by @younesbelkada in #19060 FX support for ConvNext, Wav2Vec2 and ResNet by @michaelbenayoun in #19053 [doc] Fix link in PreTrainedModel documentation by @tomaarsen in #19065 Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by @jimypbr in #18746 Organize test jobs by @sgugger in #19058 Automatically tag CLIP repos as zero-shot-image-classification by @osanseviero in #19064 Fix LeViT checkpoint by @ydshieh in #19069 TF: tests for (de)serializable models with resized tokens by @gante in #19013 Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by @daspartho in #19039 replace logger.warn by logger.warning by @fxmarty in #19068 Fix tokenizer load from one file by @sgugger in #19073 Note about developer mode by @LysandreJik in #19075 german autoclass by @flozi00 in #19049 Add tests for legacy load by url and fix bugs by @sgugger in #19078 Add runner availability check by @ydshieh in #19054 fix working dir by @ydshieh in #19101 Added type hints for TFConvBertModel by @kishore-s-15 in #19088 Added Type hints for VIT MAE by @kishore-s-15 in #19085 Add type hints for TF MPNet models by @kishore-s-15 in #19089 Added type hints to ResNetForImageClassification by @kishore-s-15 in #19084 added type hints by @daspartho in #19076 Improve vision models docs by @NielsRogge in #19103 correct spelling in README by @flozi00 in #19092 Don't warn of move if cache is empty by @sgugger in #19109 HPO: keep the original logic if there's only one process, pass the trial to trainer by @sywangyi in #19096 Add documentation of Trainer.create_model_card by @sgugger in #19110 Added type hints for YolosForObjectDetection by @kishore-s-15 in #19086 Fix the wrong schedule by @ydshieh in #19117 Change document question answering pipeline to always return an array by @ankrgyl in #19071 german processing by @flozi00 in #19121 Fix: update ltp word segmentation call in mlm_wwm by @xyh1756 in #19047 Add a missing space in a script arg documentation by @bryant1410 in #19113 Skip test_export_to_onnx for LongT5 if torch < 1.11 by @ydshieh in #19122 Fix GLUE MNLI when using max_eval_samples by @lvwerra in #18722 [BugFix] Fix fsdp option on shard_grad_op. by @ZHUI in #19131 Fix FlaxPretTrainedModel pt weights check by @mishig25 in #19133 suppoer deps from github by @lhoestq in #19141 Fix dummy creation for multi-frameworks objects by @sgugger in #19144 Allowing users to use the latest tokenizers release ! by @Narsil in #19139 Add some tests for check_dummies by @sgugger in #19146 Fixed typo in generation_utils.py by @nbalepur in #19145 Add accelerate support for ViLT by @younesbelkada in #18683 TF: check embeddings range by @gante in #19102 Reduce LR for TF MLM example test by @Rocketknight1 in #19156 update perf_train_cpu_many doc by @sywangyi in #19151 fix: ckpt paths. by @sayakpaul in #19159 Fix TrainingArguments documentation by @sgugger in #19162 fix HPO DDP GPU problem by @sywangyi in #19168 [WIP] Trainer supporting evaluation on multiple datasets by @timbmg in #19158 Add doctests to Perceiver examples by @stevenmanton in #19129 Add offline runners info in the Slack report by @ydshieh in #19169 Fix incorrect comments about atten mask for pytorch backend by @lygztq in #18728 Fixed type hint for pipelines/check_task by @Fei-Wang in #19150 Update run_clip.py by @enze5088 in #19130 german training, accelerate and model sharing by @flozi00 in #19171 Separate Push CI images from Scheduled CI by @ydshieh in #19170 Remove pos arg from Perceiver's Pre/Postprocessors by @aielawady in #18602 Use assertAlmostEqual in BloomEmbeddingTest.test_logits by @ydshieh in #19200 Move the model type check by @ankrgyl in #19027 Use repo_type instead of deprecated datasets repo IDs by @sgugger in #19202 Updated hf_argparser.py by @IMvision12 in #19188 Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by @ydshieh in #19203 Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206 Remove unused cur_len in generation_utils.py by @ekagra-ranjan in #18874 add wav2vec2_alignment by @arijitx in #16782 add doc for hyperparameter search by @sywangyi in #19192 Add a use_parallel_residual argument to control the residual computing way by @NinedayWang in #18695 translated add_new_pipeline by @nickprock in #19215 More tests for regression in cached non existence by @sgugger in #19216 Use math.pi instead of torch.pi in MaskFormer by @ydshieh in #19201 Added tests for yaml and json parser by @IMvision12 in #19219 Fix small use_cache typo in the docs by @ankrgyl in #19191 Generate: add warning when left padding should be used by @gante in #19067 Fix deprecation warning for return_all_scores by @ogabrielluiz in #19217 Fix doctest for TFDeiTForImageClassification by @ydshieh in #19173 Document and validate typical_p in generation by @mapmeld in #19128 Fix trainer seq2seq qa.py evaluate log and ft script by @iamtatsuki05 in #19208 Fix cache names in CircleCI jobs by @ydshieh in #19223 Move AutoClasses under Main Classes by @stevhliu in #19163 Focus doc around preprocessing classes by @stevhliu in #18768 Fix confusing working directory in Push CI by @ydshieh in #19234 XGLM - Fix Softmax NaNs when using FP16 by @gsarti in #18057 Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by @michaelbenayoun in #19233 Fix m2m_100.mdx doc example missing labels by @Mustapha-AJEGHRIR in #19149 Fix opt softmax small nit by @younesbelkada in #19243 Use hf_raise_for_status instead of deprecated _raise_for_status by @Wauplin in #19244 Fix TrainingArgs argument serialization by @atturaioe in #19239 Fix test fetching for examples by @sgugger in #19237 Cast TF generate() inputs by @Rocketknight1 in #19232 Skip pipeline tests by @sgugger in #19248 Add job names in Past CI artifacts by @ydshieh in #19235 Update Past CI report script by @ydshieh in #19228 [Wav2Vec2] Fix None loss in doc examples by @rbsteinm in #19218 Catch HFValidationError in TrainingSummary by @ydshieh in #19252 Add expected output to the sample code for ViTMSNForImageClassification by @sayakpaul in #19183 Add stop sequence to text generation pipeline by @KMFODA in #18444 Add notebooks by @JingyaHuang in #19259 Add beautifulsoup4 to the dependency list by @ydshieh in #19253 Fix Encoder-Decoder testing issue about repo. names by @ydshieh in #19250 Fix cached lookup filepath on windows for hub by @kjerk in #19178 Docs - Guide to add a new TensorFlow model by @gante in #19256 Update no_trainer script for summarization by @divyanshugit in #19277 Don't automatically add bug label by @sgugger in #19302 Breakup export guide by @stevhliu in #19271 Update Protobuf dependency version to fix known vulnerability by @qthequartermasterman in #19247 Update README.md by @ShubhamJagtap2000 in #19309 [Docs] Fix link by @patrickvonplaten in #19313 Fix for sequence regression fit() in TF by @Rocketknight1 in #19316 Added Type hints for LED TF by @IMvision12 in #19315 Added type hints for TF: rag model by @debjit-bw in #19284 alter retrived to retrieved by @gouqi666 in #18863 ci(stale.yml): upgrade actions/setup-python to v4 by @oscard0m in #19281 ci(workflows): update actions/checko

More information

DOI: 10.5281/zenodo.7183183

Dates

Publication date: 2020
Issued: October 01, 2020

Notes

Other: If you use this software, please cite it using these metadata.

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v4.23.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: Transformers: State-of-the-Art Natural Language Processing

Links

Summary

More information

Dates

Notes

Rights

Format

Relateditems