Transformers: State-of-th...

Title: Transformers: State-of-the-Art Natural Language Processing

Type Software Wolf, Thomas, Debut, Lysandre, Sanh, Victor, Chaumond, Julien, Delangue, Clement, Moi, Anthony, Cistac, Perric, Ma, Clara, Jernite, Yacine, Plu, Julien, Xu, Canwen, Le Scao, Teven, Gugger, Sylvain, Drame, Mariama, Lhoest, Quentin, Rush, Alexander M. (2020): Transformers: State-of-the-Art Natural Language Processing. Zenodo. Software. https://zenodo.org/record/7080024

Authors: Wolf, Thomas ; Debut, Lysandre ; Sanh, Victor ; Chaumond, Julien ; Delangue, Clement ; Moi, Anthony ; Cistac, Perric ; Ma, Clara ; Jernite, Yacine ; Plu, Julien ; Xu, Canwen ; Le Scao, Teven ; Gugger, Sylvain ; Drame, Mariama ; Lhoest, Quentin ; Rush, Alexander M. ;

Summary

Swin Transformer v2

The Swin Transformer V2 model was proposed in Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Add swin transformer v2 by @nandwalritik in #17469 VideoMAE

The VideoMAE model was proposed in VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders (MAE) to video, claiming state-of-the-art performance on several video classification benchmarks.

Add VideoMAE by @NielsRogge in #17821 Donut

The Donut model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

Add Donut by @NielsRogge in #18488 Pegasus-X

The PEGASUS-X model was proposed in Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

PEGASUS-X by @zphang in #18551 X-CLIP

The X-CLIP model was proposed in Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of CLIP for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

Add X-CLIP by @NielsRogge in #18852 ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including ERNIE1.0, ERNIE2.0, ERNIE3.0, ERNIE-Gram, ERNIE-health, etc. These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

ERNIE-2.0 and ERNIE-3.0 models by @nghuyong in #18686 TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

TensorFlow MobileViT by @sayakpaul in #18555 [LayoutLMv3] Add TensorFlow implementation by @ChrisFugl in #18678 New task-specific architectures

A new question answering head was added for the LayoutLM model.

Add LayoutLMForQuestionAnswering model by @ankrgyl in #18407 New pipelines

Two new pipelines are available in transformers: a document question answering pipeline, as well as an image to text generation pipeline.

Add DocumentQuestionAnswering pipeline by @ankrgyl in #18414 Add Image To Text Generation pipeline by @OlivierDehaene in #18821 M1 support

There is now Mac M1 support in PyTorch in transformers in pipelines and the Trainer.

pipeline support for device="mps" (or any other string) by @julien-c in #18494 mac m1 mps integration by @pacman100 in #18598 Backend version compatibility

Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago. Versions older than two years-old will not be supported going forward.

We're making this change as we begin actively testing transformers compatibility on older versions. This project can be followed here.

PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 by @sgugger in #19016 Generate method updates

The generate method now starts enforcing stronger validation in order to ensure proper usage.

Generate: validate model_kwargs (and catch typos in generate arguments) by @gante in #18261 Generate: validate model_kwargs on TF (and catch typos in generate arguments) by @gante in #18651 Generate: add model class validation by @gante in #18902 API changes

The as_target_tokenizer and as_target_processor context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:

with tokenizer.as_target_tokenizer(): encoded_labels = tokenizer(labels, padding=True)

becomes

encoded_labels = tokenizer(text_target=labels, padding=True) Replace as_target context managers by direct calls by @sgugger in #18325 Bits and bytes integration

Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

Supporting seq2seq models for bitsandbytes integration by @younesbelkada in #18579 bitsandbytes - Linear8bitLt integration into transformers models by @younesbelkada in #17901 Large model support

Models that have sharded checkpoints in PyTorch can be loaded in Flax.

Load sharded pt to flax by @ArthurZucker in #18419 TensorFlow improvements

The TensorFlow examples have been rewritten to support all recent features developped in the past months.

TF Examples Rewrite by @Rocketknight1 in #18451

DeBERTa-v2 is now trainable with XLA.

TF: XLA-trainable DeBERTa v2 by @gante in #18546 Documentation changes Split model list on modality by @stevhliu in #18328 Improvements and bugfixes sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by @LysandreJik in #18320 Fix sacremoses sof dependency for Transformers XL by @sgugger in #18321 Owlvit test fixes by @alaradirik in #18303 [Flax] Fix incomplete batches in example scripts by @sanchit-gandhi in #17863 start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by @sywangyi in #18229 Update feature extractor docs by @stevhliu in #18324 fixed typo by @banda-larga in #18331 updated translation by @banda-larga in #18333 Updated _toctree.yml by @nickprock in #18337 Update automatic_speech_recognition.py by @bofenghuang in #18339 Fix codeparrot deduplication - ignore whitespaces by @loubnabnl in #18023 Remove Flax OPT from doctest for now by @ydshieh in #18338 Include tensorflow-aarch64 as a candidate by @ankrgyl in #18345 [BLOOM] Deprecate position_ids by @thomasw21 in #18342 Migrate metric to Evaluate library for tensorflow examples by @VijayKalmath in #18327 Migrate metrics used in flax examples to Evaluate by @VijayKalmath in #18348 [Docs] Fix Speech Encoder Decoder doc sample by @sanchit-gandhi in #18346 Fix OwlViT torchscript tests by @ydshieh in #18347 Fix some doctests by @ydshieh in #18359 [FX] Symbolic trace for Bloom by @michaelbenayoun in #18356 Fix TFSegformerForSemanticSegmentation doctest by @ydshieh in #18362 fix FSDP ShardedGradScaler by @pacman100 in #18358 Migrate metric to Evaluate in Pytorch examples by @atturaioe in #18369 Correct the spelling of bleu metric by @ToluClassics in #18375 Remove pt-like calls on tf tensor by @amyeroberts in #18393 Fix from_pretrained kwargs passing by @YouJiacheng in #18387 Add a check regarding the number of occurrences of ``` by @ydshieh in #18389 Add evaluate to test dependencies by @sgugger in #18396 Fix OPT doc tests by @ArthurZucker in #18365 Fix doc tests by @NielsRogge in #18397 Add balanced strategies for device_map in from_pretrained by @sgugger in #18349 Fix docs by @NielsRogge in #18399 Adding fine-tuning models to LUKE by @ikuyamada in #18353 Fix ROUGE add example check and update README by @sgugger in #18398 Add Flax BART pretraining script by @duongna21 in #18297 Rewrite push_to_hub to use upload_files by @sgugger in #18366 Layoutlmv2 tesseractconfig by @kelvinAI in #17733 fix: create a copy for tokenizer object by @YBooks in #18408 Fix uninitialized parameter in conformer relative attention. by @PiotrDabkowski in #18368 Fix the hub user name in a longformer doctest checkpoint by @ydshieh in #18418 Change audio kwarg to images in TROCR processor by @ydshieh in #18421 update maskformer docs by @alaradirik in #18423 Fix test_load_default_pipelines_tf test error by @ydshieh in #18422 fix run_clip README by @ydshieh in #18332 Improve generate docstring by @JoaoLages in #18198 Accept trust_remote_code and ignore it in PreTrainedModel.from_pretrained by @ydshieh in #18428 Update pipeline word heuristic to work with whitespace in token offsets by @davidbenton in #18402 Add programming languages by @cakiki in #18434 fixing error when using sharded ddp by @pacman100 in #18435 Update _toctree.yml by @stevhliu in #18440 support ONNX export of XDropout in deberta{,_v2} and sew_d by @garymm in #17502 Add Spanish translation of run_scripts.mdx by @donelianc in #18415 Update no trainer scripts for language modeling and image classification examples by @nandwalritik in #18443 Update pinned hhub version by @osanseviero in #18448 Fix failing tests for XLA generation in TF by @dsuess in #18298 add zero-shot obj detection notebook to docs by @alaradirik in #18453 fix: keras fit tests for segformer tf and minor refactors. by @sayakpaul in #18412 Fix torch version comparisons by @LSinev in #18460 [BLOOM] Clean modeling code by @thomasw21 in #18344 change shape to support dynamic batch input in tf.function XLA generate for tf serving by @nlpcat in #18372 HFTracer.trace can now take callables and torch.nn.Module by @michaelbenayoun in #18457 Update no trainer scripts for multiple-choice by @kiansierra in #18468 Fix load of model checkpoints in the Trainer by @sgugger in #18470 Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by @thomasw21 in #18363 Add machine type in the artifact of Examples directory job by @ydshieh in #18459 Update no trainer examples for QA and Semantic Segmentation by @kiansierra in #18474 Add TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING by @ydshieh in #18469 Fixing issue where generic model types wouldn't load properly with the pipeline by @Narsil in #18392 Fix TFSwinSelfAttention to have relative position index as non-trainable weight by @harrydrippin in #18226 Refactor TFSwinLayer to increase serving compatibility by @harrydrippin in #18352 Add TF prefix to TF-Res test class by @ydshieh in #18481 Remove py.typed by @sgugger in #18485 Fix pipeline tests by @sgugger in #18487 Use new huggingface_hub tools for download models by @sgugger in #18438 Fix test_dbmdz_english by updating expected values by @ydshieh in #18482 Move cache folder to huggingface/hub for consistency with hf_hub by @sgugger in #18492 Update some expected values in quicktour.mdx for resampy 0.3.0 by @ydshieh in #18484 disable Onnx test for google/long-t5-tglobal-base by @ydshieh in #18454 Typo reported by Joel Grus on TWTR by @julien-c in #18493 Just re-reading the whole doc every couple of months 😬 by @julien-c in #18489 transformers-cli login => huggingface-cli login by @julien-c in #18490 Add seed setting to image classification example by @regisss in #18519 [DX fix] Fixing QA pipeline streaming a dataset. by @Narsil in #18516 Clean up hub by @sgugger in #18497 update fsdp docs by @pacman100 in #18521 Fix compatibility with 1.12 by @sgugger in #17925 Specify en in doc-builder README example by @ankrgyl in #18526 New cache fixes: add safeguard before looking in folders by @sgugger in #18522 unpin resampy by @ydshieh in #18527 ✨ update to use interlibrary links instead of Markdown by @stevhliu in #18500 Add example of multimodal usage to pipeline tutorial by @stevhliu in #18498 [VideoMAE] Add model to doc tests by @NielsRogge in #18523 Update perf_train_gpu_one.mdx by @mishig25 in #18532 Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by @Rasmusafj in #18473 Add Spanish translation of converting_tensorflow_models.mdx by @donelianc in #18512 Spanish translation of summarization.mdx by @AguilaCudicio in #15947) Let's not cast them all by @younesbelkada in #18471 fix: data2vec-vision Onnx ready-made configuration. by @NikeNano in #18427 Add mt5 onnx config by @ChainYo in #18394 Minor update of run_call_with_unpacked_inputs by @ydshieh in #18541 BART - Fix attention mask device issue on copied models by @younesbelkada in #18540 Adding a new align_to_words param to qa pipeline. by @Narsil in #18010 📝 update metric with evaluate by @stevhliu in #18535 Restore _init_weights value in no_init_weights by @YouJiacheng in #18504 📝 update documentation build section by @stevhliu in #18548 Preserve hub-related kwargs in AutoModel.from_pretrained by @sgugger in #18545 Use commit hash to look in cache instead of calling head by @sgugger in #18534 Update philosophy to include other preprocessing classes by @stevhliu in #18550 Properly move cache when it is not in default path by @sgugger in #18563 Adds CLIP to models exportable with ONNX by @unography in #18515 raise atol for MT5OnnxConfig by @ydshieh in #18560 fix string by @mrwyattii in #18568 Segformer TF: fix output size in documentation by @joihn in #18572 Fix resizing bug in OWL-ViT by @alaradirik in #18573 Fix LayoutLMv3 documentation by @pocca2048 in #17932 Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by @donebydan in #18486 german docs translation by @flozi00 in #18544 Deberta V2: Fix critical trace warnings to allow ONNX export by @iiLaurens in #18272 [FX] _generate_dummy_input supports audio-classification models for labels by @michaelbenayoun in #18580 Fix docstrings with last version of hf-doc-builder styler by @sgugger in #18581 fix owlvit tests, update docstring examples by @alaradirik in #18586 Return the permuted hidden states if return_dict=True by @amyeroberts in #18578 Add type hints for ViLT models by @donelianc in #18577 update doc for perf_train_cpu_many, add intel mpi introduction by @sywangyi in #18576 typos by @stas00 in #18594 FSDP bug fix for load_state_dict by @pacman100 in #18596 Add TFAutoModelForSemanticSegmentation to the main __init__.py by @ydshieh in #18600 Fix URLs by @NielsRogge in #18604 Update BLOOM parameter counts by @Muennighoff in #18531 [doc] fix anchors by @stas00 in #18591 [fsmt] deal with -100 indices in decoder ids by @stas00 in #18592 small change by @younesbelkada in #18584 Flax Remat for LongT5 by @KMFODA in #17994 Change scheduled CIs to use torch 1.12.1 by @ydshieh in #18644 Add checks for some workflow jobs by @ydshieh in #18583 TF: Fix generation repetition penalty with XLA by @gante in #18648 Update longt5.mdx by @flozi00 in #18634 Update run_translation_no_trainer.py by @zhoutang776 in #18637 [bnb] Minor modifications by @younesbelkada in #18631 Examples: add Bloom support for token classification by @stefan-it in #18632 Fix Yolos ONNX export test by @ydshieh in #18606 Fix matmul inputs dtype by @JingyaHuang in #18585 Update feature extractor methods to enable type cast before normalize by @amyeroberts in #18499 Allow users to force TF availability by @Rocketknight1 in #18650 [LongT5] Correct docs long t5 by @patrickvonplaten in #18669 Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) by @gante in #18653 Ping detectron2 for CircleCI tests by @ydshieh in #18680 Rename method to avoid clash with property by @amyeroberts in #18677 Rename second input dimension from "sequence" to "num_channels" for CV models by @regisss in #17976 Fix repo consistency by @lewtun in #18682 Fix breaking change in onnxruntime for ONNX quantization by @severinsimmler in #18336 Add evaluate to examples requirements by @muellerzr in #18666 [bnb] Move documentation by @younesbelkada in #18671 Add an examples folder for code downstream tasks by @loubnabnl in #18679 model.tie_weights() should be applied after accelerator.prepare() by @Gladiator07 in #18676 Generate: add missing **model_kwargs in sample tests by @gante in #18696 Temp fix for broken detectron2 import by @patrickvonplaten in #18699 [Hotfix] pin detectron2 5aeb252 to avoid test fix by @ydshieh in #18701 Fix Data2VecVision ONNX test by @ydshieh in #18587 Add missing tokenizer tests - Longformer by @tgadeliya in #17677 remove check for main process for trackers initialization by @Gladiator07 in #18706 Unpin detectron2 by @ydshieh in #18727 Removing warning of model type for microsoft/tapex-base-finetuned-wtq by @Narsil in #18711 improve add_tokens docstring by @SaulLu in #18687 CLI: Don't check the model head when there is no model head by @gante in #18733 Update perf_infer_gpu_many.mdx by @mishig25 in #18744 Add minor doc-string change to include hp_name param in hyperparameter_search by @constantin-huetterer in #18700 fix pipeline_tutorial.mdx doctest by @ydshieh in #18717 Add TF implementation of XGLMModel by @stancld in #16543 fixed docstring typos by @JadeKim042386 in #18739 add warning to let the user know that the

More information

DOI: 10.5281/zenodo.7080024

Dates

Publication date: 2020
Issued: October 01, 2020

Notes

Other: If you use this software, please cite it using these metadata.

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v4.22.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: Transformers: State-of-the-Art Natural Language Processing

Links

Summary

More information

Dates

Notes

Rights

Format

Relateditems