Title: huggingface/transformers: Trainer, TFTrainer, Multilingual BART, Encoder-decoder improvements, Generation Pipeline
Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Victor SANH, Patrick von Platen, Aymeric Augustin, Rémi Louf, Funtowicz Morgan, Sam Shleifer, Stefan Schweter, Manuel Romero, Denis, erenup, Matt, Piero Molino, Grégory Châtel, Bram Vanroy, Tim Rault, Gunnlaugur Thor Briem, Anthony MOI, Malte Pietsch, Catalin Voss, Bilal Khan, Fei Wang, Louis Martin, Davide Fiocco, Martin Malmsten, Lorenzo Ampil, HUSEIN ZOLKEPLI, Clement (2020): huggingface/transformers: Trainer, TFTrainer, Multilingual BART, Encoder-decoder improvements, Generation Pipeline. Zenodo. Software. https://zenodo.org/record/3813846
Links
- Item record in Zenodo
- Digital object URL
Summary
Trainer & TFTrainer
Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTrainer for TF 2.
This let us reorganize the example scripts completely for a cleaner codebase.
The main features of the Trainer are:
Same user-facing API for PyTorch and TF 2 Support for CPU, GPU, Multi-GPU, and TPU Easier than ever to share your fine-tuned modelsThe TFTrainer was largely contributed by awesome community member @jplu! 🔥 🔥
A few additional features of the example scripts are:
Generate argparsers from type hints on dataclasses Can load arguments from json files Logging through TensorBoard and wandbDocumentation for the Trainer is still work-in-progress, please consider contributing improvements.
TPU Support Both the TensorFlow and PyTorch trainers have TPU support (@jplu, @LysandreJik, @julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to torch.distributed. This was built with the support of @jysohn23, member of the Google TPU team Multilingual BART (@sshleifer)New BART checkpoint converted: this adds mbart-en-ro model, a BART variant finetuned on english-romanian translation.
Improved support for huggingface/tokenizers Additional tests and support has been added to huggingface/tokenizers tokenizers. (@mfuntowicz, @thomwolf) TensorFlow models work out-of-the-box with the new tokenizers (@LysandreJik) Decoder caching for T5 (@patrickvonplaten)Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow.
Breaking changeThis introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states).
Encoder-Decoder enhancements Apply Encoder Decoder 1.5GB memory savings to TF as well (@patrickvonplaten, translation of same work on PyTorch models by @sshleifer) BART Summarization fine-tuning script now works for T5 as well (@sshleifer) Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (@patrickvonplaten) Additional model architecturesQuestion Answering support for Albert and Roberta in TF with (@Pierrci):
Question Answering support for Albert and Roberta in TF TFAlbertForQuestionAnswering Pipelines The question answering pipeline now handles impossible answers (@bryant1410) Remove tqdm logging (@mfuntowicz) Sentiment analysis pipeline can now handle more than two sequences (@xxbidiao) Rewritten batch support in pipelines (@mfuntowicz) Text Generation pipeline (@enzoampil)Implements a text generation pipeline, GenerationPipeline, which works on any ModelWithLMHead head.
Fixes and improvements Clean the generate testing functions (@patrickvonplaten) Notebooks updated in the documentation (@LysandreJik) Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (@ethanjperez) Fixed RoBERTa conversion script (@myleott) Speedup torch summarization tests (@sshleifer) Optimize causal mask using torch.where (@Akababa) Improved benchmarking utils (@patrickvonplaten) Fixed edge case for bert tokenization (@patrickvonplaten) SummarizationDataset cleanup (@sshleifer) BART: Replace config.output_past with use_cache kwarg (@sshleifer) Better documentation for Summarization and Translation pipeline (@julien-c) Additional documentation for model cards (@julien-c) Fix force_download of files on Windows (@calpt) Fix shuffling issue for distributed training (@elk-cloner) Shift labels internally within TransfoXLLMHeadModel when called with labels (@TevenLeScao) Remove output_past everywhere and replace by use_cache argument (@patrickvonplaten) Added unit test for run_bart_sum (@sshleifer) Cleaner code by factorizating a few methods back in the PreTrainedModel (@sshleifer) [Bert] remove hard-coded pad token id (@patrickvonplaten) Clean pipelines test and remove unnecessary code (@patrickvonplaten) JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (@LysandreJik) Change newstest2013 to newstest2014 and clean up (@patrickvonplaten) Factor out tensor conversion method in PretrainedTokenizer (@sshleifer) Remove tanh torch warnings (@aryanshomray) Fix token_type_id in BERT question-answering example (@siboehm) Add CircleCI workflow to build docs for preview (@harupy) Higher tolerance for past testing in T5 and TF T5 (@patrickvonplaten) XLM tokenizer should encode with bos token (@LysandreJik) XLM tokenizer should encode with bos token (@patrickvonplaten) fix summarization do_predict (@sshleifer) Encode to max length of input not max length of tokenizer for batch input (@patrickvonplaten) Add qas_id to SquadResult and SquadExample (@jarednielsen) Fix bug in run_*.py scripts: double wrap into DataParallel during eval (@and-kul) Fix torchhub integration (@julien-c) Fix TFAlbertForSequenceClassification classifier dropout probability (@jarednielsen) Change uses of pow(x, 3) to pow(x, 3.0) (@mneilly-et) Shuffle train subset for summarization example (@Colanim) Removed the boto3 dependency (@julien-c) Add dialogpt training tips (@patrickvonplaten) Generation can now start with an empty prompt (@patrickvonplaten) GPT-2 is now traceable (@jazzcook15) Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (@sshleifer) Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (@thomwolf) Now using CDN urls for weights (@julien-c) [Fix common tests on GPU] send model, ids to torch_device (@sshleifer) Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (@jarednielsen) Additional metadata to traing arguments (@parmarsuraj99) [ci] Load pretrained models into the default (long-lived) cache (@julien-c) add timeout_decorator to tests (@sshleifer) Added XLM-R to the multilingual section in the documentation (@stefan-it) Better num_labels in configuration objects Updated pytorch lightning scripts (@williamFalcon) Tests now pass with torch 1.5.0 (@LysandreJik) Ensure fast tokenizer can construct single-element tensor without pad token (@mfuntowicz)More information
- DOI: 10.5281/zenodo.3813846
Dates
- Publication date: 2020
- Issued: May 07, 2020
Rights
- info:eu-repo/semantics/openAccess Open Access
Format
electronic resource
Relateditems
Description | Item type | Relationship | Uri |
---|---|---|---|
IsSupplementTo | https://github.com/huggingface/transformers/tree/v2.9.0 | ||
IsVersionOf | https://doi.org/10.5281/zenodo.3385997 | ||
IsPartOf | https://zenodo.org/communities/zenodo |