Transformers: State-of-th...

Title: Transformers: State-of-the-Art Natural Language Processing

Type Software Wolf, Thomas, Debut, Lysandre, Sanh, Victor, Chaumond, Julien, Delangue, Clement, Moi, Anthony, Cistac, Perric, Ma, Clara, Jernite, Yacine, Plu, Julien, Xu, Canwen, Le Scao, Teven, Gugger, Sylvain, Drame, Mariama, Lhoest, Quentin, Rush, Alexander M. (2020): Transformers: State-of-the-Art Natural Language Processing. Zenodo. Software. https://zenodo.org/record/7391177

Authors: Wolf, Thomas ; Debut, Lysandre ; Sanh, Victor ; Chaumond, Julien ; Delangue, Clement ; Moi, Anthony ; Cistac, Perric ; Ma, Clara ; Jernite, Yacine ; Plu, Julien ; Xu, Canwen ; Le Scao, Teven ; Gugger, Sylvain ; Drame, Mariama ; Lhoest, Quentin ; Rush, Alexander M. ;

Summary

PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack. You can enable torch.compile on any of our models, and get support with the Trainer (and in all our PyTorch examples) by using the torchdynamo training argument. For instance, just add --torchdynamo inductor when launching those examples from the command line.

This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.

Note that to get the best performance, we recommend:

using an Ampere GPU (or more recent)

sticking to fixed shaped for now (so use --pad_to_max_length in our examples)

Repurpose torchdynamo training args towards torch._dynamo by @sgugger in #20498

Audio Spectrogram Transformer

The Audio Spectrogram Transformer model was proposed in AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a Vision Transformer to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.

Add Audio Spectogram Transformer by @NielsRogge in #19981 Jukebox

The Jukebox model was proposed in Jukebox: A generative model for music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.

Add Jukebox model (replaces #16875) by @ArthurZucker in #17826 Switch Transformers

The SwitchTransformers model was proposed in Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.

It is the first MoE model supported in transformers, with the largest checkpoint currently available currently containing 1T parameters.

Add Switch transformers by @younesbelkada and @ArthurZucker in #19323 RocBert

The RoCBert model was proposed in RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It's a pretrained Chinese language model that is robust under various forms of adversarial attacks.

Add RocBert by @sww9370 in #20013 CLIPSeg

The CLIPSeg model was proposed in Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker. CLIPSeg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.

Add CLIPSeg by @NielsRogge in #20066 NAT and DiNAT NAT

NAT was proposed in Neighborhood Attention Transformer by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.

DiNAT

DiNAT was proposed in Dilated Neighborhood Attention Transformer by Ali Hassani and Humphrey Shi.

It extends NAT by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.

Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by @alihassanijr in #20219 MobileNetV2

The MobileNet model was proposed in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

add MobileNetV2 model by @hollance in #17845 MobileNetV1

The MobileNet model was proposed in MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

add MobileNetV1 model by @hollance in #17799 Image processors

Image processors replace feature extractors as the processing class for computer vision models.

Important changes:

size parameter is now a dictionary of {"height": h, "width": w}, {"shortest_edge": s}, {"shortest_egde": s, "longest_edge": l} instead of int or tuple. Addition of data_format flag. You can now specify if you want your images to be returned in "channels_first" - NCHW - or "channels_last" - NHWC - format. Processing flags e.g. do_resize can be passed directly to the preprocess method instead of modifying the class attribute: image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last") Leaving return_tensors unset will return a list of numpy arrays.

The classes are backwards compatible and can be created using existing feature extractor configurations - with the size parameter converted.

Add Image Processors by @amyeroberts in #19796 Add Donut image processor by @amyeroberts #20425 Add segmentation + object detection image processors by @amyeroberts in #20160 AutoImageProcessor by @amyeroberts in #20111 Backbone for computer vision models

We're adding support for a general AutoBackbone class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.

Add AutoBackbone + ResNetBackbone by @NielsRogge in #20229 Improve backbone by @NielsRogge in #20380 [AutoBackbone] Improve API by @NielsRogge in #20407 Support for safetensors offloading

If the model you are using has a safetensors checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.

Safetensors offload by @sgugger in #20321 Contrastive search in the generate method Generate: TF contrastive search with XLA support by @gante in #20050 Generate: contrastive search with full optional outputs by @gante in #19963 Breaking changes 🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string by @beneyal in #15775 Bugfixes and improvements add dataset by @stevhliu in #20005 Add BERT resources by @stevhliu in #19852 Add LayoutLMv3 resource by @stevhliu in #19932 fix typo by @stevhliu in #20006 Update object detection pipeline to use post_process_object_detection methods by @alaradirik in #20004 clean up vision/text config dict arguments by @ydshieh in #19954 make sentencepiece import conditional in bertjapanesetokenizer by @ripose-jp in #20012 Fix gradient checkpoint test in encoder-decoder by @ydshieh in #20017 Quality by @sgugger in #20002 Update auto processor to check image processor created by @amyeroberts in #20021 [Doctest] Add configuration_deberta_v2.py by @Saad135 in #19995 Improve model tester by @ydshieh in #19984 Fix doctest by @ydshieh in #20023 Show installed libraries and their versions in CI jobs by @ydshieh in #20026 reorganize glossary by @stevhliu in #20010 Now supporting pathlike in pipelines too. by @Narsil in #20030 Add **kwargs by @amyeroberts in #20037 Fix some doctests after PR 15775 by @ydshieh in #20036 [Doctest] Add configuration_camembert.py by @Saad135 in #20039 [Whisper Tokenizer] Make more user-friendly by @sanchit-gandhi in #19921 [FuturWarning] Add futur warning for LEDForSequenceClassification by @ArthurZucker in #19066 fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by @sywangyi in #19891 Update esmfold conversion script by @Rocketknight1 in #20028 Fixed torch.finfo issue with torch.fx by @michaelbenayoun in #20040 Only resize embeddings when necessary by @sgugger in #20043 Speed up TF token classification postprocessing by converting complete tensors to numpy by @deutschmn in #19976 Fix ESM LM head test by @Rocketknight1 in #20045 Update README.md by @bofenghuang in #20063 fix tokenizer_type to avoid error when loading checkpoint back by @pacman100 in #20062 [Trainer] Fix model name in push_to_hub by @sanchit-gandhi in #20064 PoolformerImageProcessor defaults to match previous FE by @amyeroberts in #20048 change constant torch.tensor to torch.full by @MerHS in #20061 Update READMEs for ESMFold and add notebooks by @Rocketknight1 in #20067 Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by @jordiclive in #20068 Allow passing arguments to model testers for CLIP-like models by @ydshieh in #20044 Show installed libraries and their versions in GA jobs by @ydshieh in #20069 Update defaults and logic to match old FE by @amyeroberts in #20065 Update modeling_tf_utils.py by @cakiki in #20076 Update hub.py by @cakiki in #20075 [Doctest] Add configuration_dpr.py by @Saad135 in #20080 Removing RobertaConfig inheritance from CamembertConfig by @Saad135 in #20059 Skip 2 tests in VisionTextDualEncoderProcessorTest by @ydshieh in #20098 Replace unsupported facebookresearch/bitsandbytes by @tomaarsen in #20093 docs: Resolve many typos in the English docs by @tomaarsen in #20088 use huggingface_hub.model_inifo() to get pipline_tag by @y-tag in #20077 Fix generate_dummy_inputs for ImageGPTOnnxConfig by @ydshieh in #20103 docs: Fixed variables in f-strings by @tomaarsen in #20087 Add new terms to the glossary by @stevhliu in #20051 Replace awkward timm link with the expected one by @tomaarsen in #20109 Fix AutoTokenizer with subfolder passed by @sgugger in #20110 [Audio Processor] Only pass sr to feat extractor by @sanchit-gandhi in #20022 Update github pr docs actions by @mishig25 in #20125 Adapt has_labels test when no labels were found by @sgugger in #20113 Improve tiny model creation script by @ydshieh in #20119 Remove BertConfig inheritance from RobertaConfig by @Saad135 in #20124 [Swin] Add Swin SimMIM checkpoints by @NielsRogge in #20034 Update CLIPSegModelTester by @ydshieh in #20134 Update SwinForMaskedImageModeling doctest values by @amyeroberts in #20139 Attempting to test automatically the _keys_to_ignore. by @Narsil in #20042 Generate: move generation_.py src files into generation/.py by @gante in #20096 add cv + audio labels by @stevhliu in #20114 Update VisionEncoderDecoder to use an image processor by @amyeroberts in #20137 [CLIPSeg] Add resources by @NielsRogge in #20118 Make DummyObject more robust by @mariosasko in #20146 Add RoCBertTokenizer to TOKENIZER_MAPPING_NAMES by @ydshieh in #20141 Adding support for LayoutLMvX variants for object-detection. by @Narsil in #20143 Add doc tests by @NielsRogge in #20158 doc comment fix: Args was in wrong place by @hollance in #20164 Update OnnxConfig.generate_dummy_inputs to check ImageProcessingMixin by @ydshieh in #20157 Generate: fix TF doctests by @gante in #20159 Fix arg names for our models by @Rocketknight1 in #20166 [processor] Add 'model input names' property by @sanchit-gandhi in #20117 Fix object-detection bug (height, width inversion). by @Narsil in #20167 [OWL-ViT] Make model consistent with CLIP by @NielsRogge in #20144 Fix type - update any PIL.Image.Resampling by @amyeroberts in #20172 Fix tapas scatter by @Bearnardd in #20149 Update README.md by @code-with-rajeev in #19530 Proposal Remove the weird inspect in ASR pipeline and make WhisperEncoder just nice to use. by @Narsil in #19571 Pytorch type hints by @IMvision12 in #20112 Generate: TF sample doctest result update by @gante in #20208 [ROC_BERT] Make CI happy by @younesbelkada in #20175 add _keys_to_ignore_on_load_unexpected = [r"pooler"] by @ArthurZucker in #20210 docs: translated index page to korean by @wonhyeongseo in #20180 feat: add i18n issue template by @wonhyeongseo in #20199 [Examples] Generalise Seq2Seq ASR to handle Whisper by @sanchit-gandhi in #19519 mark test_save_load_fast_init_from_base as is_flaky by @ydshieh in #20200 Update README.md by @Nietism in #20188 Downgrade log warning -> info by @amyeroberts in #20202 Generate: add Bloom fixes for contrastive search by @gante in #20213 Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by @Narsil in #20104 [docs] set overflowing image width to auto-scale by @wonhyeongseo in #20197 Update tokenizer_summary.mdx by @bofenghuang in #20135 Make ImageSegmentationPipelineTests less flaky by @ydshieh in #20147 update relative positional embedding by @ArthurZucker in #20203 [WHISPER] Update modeling tests by @ArthurZucker in #20162 Add accelerate support for ViT family by @younesbelkada in #20174 Add param_name to size_dict logs & tidy by @amyeroberts in #20205 Add object detection + segmentation transforms by @amyeroberts in #20003 Typo on doctring in ElectraTokenizer by @FacerAin in #20192 Remove authorized_missing_keysin favor of _keys_to_ignore_on_load_missing by @ArthurZucker in #20228 Add missing ESM autoclass by @Rocketknight1 in #20177 fix device issue by @ydshieh in #20227 fixed spelling error in testing.mdx by @kasmith11 in #20220 Fix run_clip.py by @ydshieh in #20234 Fix docstring of CLIPTokenizer(Fast) by @TilmannR in #20233 Fix MaskformerFeatureExtractor by @NielsRogge in #20100 New logging support to "Trainer" Class (ClearML Logger) by @skinan in #20184 Enable PyTorch 1.13 by @sgugger in #20168 [CLIP] allow loading projection layer in vision and text model by @patil-suraj in #18962 Slightly alter Keras dummy loss by @Rocketknight1 in #20232 Add to DeBERTa resources by @Saad135 in #20155 Add clip resources to the transformers documentation by @ambujpawar in #20190 Update reqs to include min gather_for_metrics Accelerate version by @muellerzr in #20242 Allow trainer to return eval. loss for CLIP-like models by @ydshieh in #20214 Adds image-guided object detection support to OWL-ViT by @alaradirik in #20136 Adding audio-classification example in the doc. by @Narsil in #20235 Updating the doctest for conversational. by @Narsil in #20236 Adding doctest for fill-mask pipeline. by @Narsil in #20241 Adding doctest for feature-extraction. by @Narsil in #20240 Adding ASR pipeline example. by @Narsil in #20226 Adding doctest for document-question-answering by @Narsil in #20239 Adding an example for depth-estimation pipeline. by @Narsil in #20237 Complete doc migration by @mishig25 in #20267 Fix result saving errors of pytorch examples by @li-plus in #20276 Adding a doctest for table-question-answering pipeline. by @Narsil in #20260 Adding doctest for image-segmentation pipeline. by @Narsil in #20256 Adding doctest for text2text-generation pipeline. by @Narsil in #20261 Adding doctest for text-generation pipeline. by @Narsil in #20264 Add TF protein notebook to notebooks doc by @Rocketknight1 in #20271 Rephrasing the link. by @Narsil in #20253 Add Chinese-CLIP implementation by @yangapku in #20368 Adding doctest example for image-classification pipeline. by @Narsil in #20254 Adding doctest for zero-shot-image-classification pipeline. by @Narsil in #20272 Adding doctest for zero-shot-classification pipeline. by @Narsil in #20268 Adding doctest for visual-question-answering pipeline. by @Narsil in #20266 Adding doctest for text-classification pipeline. by @Narsil in #20262 Adding doctest for question-answering pipeline. by @Narsil in #20259 [Docs] Add resources of OpenAI GPT by @shogohida in #20084 Adding doctest for image-to-text pipeline. by @Narsil in #20257 Adding doctest for token-classification pipeline. by @Narsil in #20265 remaining pytorch type hints by @IMvision12 in #20217 Data collator for token classification pads labels column when receives pytorch tensors by @markovalexander in #20244 [Doctest] Add configuration_deformable_detr.py by @Saad135 in #20273 Fix summarization script by @muellerzr in #20286 [DOCTEST] Fix the documentation of RoCBert by @ArthurZucker in #20142 [bnb] Let's warn users when saving 8-bit models by @younesbelkada in #20282 Adding zero-shot-object-detection pipeline doctest. by @Narsil in #20274 Adding doctest for object-detection pipeline. by @Narsil in #20258 Image transforms functionality used instead by @amyeroberts in #20278 TF: add test for PushToHubCallback by @gante in #20231 Generate: general TF XLA constrastive search are now slow tests by @gante in #20277 Fixing the doctests failures. by @Narsil in #20294 set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by @sywangyi in #20289 Add docstrings for canine model by @raghavanone in #19457 Add missing report button for Example test by @ydshieh in #20293 refactor test by @younesbelkada in #20300 [Tiny model creation] deal with ImageProcessor by @ydshieh in #20298 Fix blender bot missleading doc by @ArthurZucker in #20301 remove two tokens that should not be suppressed by @ArthurZucker in #20302 [ASR

More information

DOI: 10.5281/zenodo.7391177

Dates

Publication date: 2020
Issued: October 01, 2020

Notes

Other: If you use this software, please cite it using these metadata.

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v4.25.1
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries Homepage

Title: Transformers: State-of-the-Art Natural Language Processing

Links

Summary

More information

Dates

Notes

Rights

Format

Relateditems