huggingface/transformers:...

Title: huggingface/transformers: T5 Model, BART summarization example and reduced memory, translation pipeline

Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Victor SANH, Patrick von Platen, Aymeric Augustin, Rémi Louf, Funtowicz Morgan, Stefan Schweter, Denis, Sam Shleifer, erenup, Manuel Romero, Matt, Piero Molino, Grégory Châtel, Bram Vanroy, Tim Rault, Gunnlaugur Thor Briem, Anthony MOI, Malte Pietsch, Julien Plu, Catalin Voss, Bilal Khan, Fei Wang, Martin Malmsten, Louis Martin, Davide Fiocco, Clement, Ananya Harsh Jha (2020): huggingface/transformers: T5 Model, BART summarization example and reduced memory, translation pipeline. Zenodo. Software. https://zenodo.org/record/3733180

Authors: Thomas Wolf (@huggingface) ; Lysandre Debut (Hugging Face) ; Julien Chaumond (Hugging Face) ; Victor SANH (@huggingface) ; Patrick von Platen ; Aymeric Augustin (@canalplus) ; Rémi Louf ; Funtowicz Morgan (HuggingFace) ; Stefan Schweter ; Denis ; Sam Shleifer (Huggingface) ; erenup ; Manuel Romero ; Matt ; Piero Molino ; Grégory Châtel (DisAItek & Intel AI Innovators) ; Bram Vanroy (@UGent) ; Tim Rault (@huggingface) ; Gunnlaugur Thor Briem (Qlik) ; Anthony MOI (Hugging Face) ; Malte Pietsch (deepset) ; Julien Plu (Leboncoin Lab) ; Catalin Voss (Stanford University) ; Bilal Khan ; Fei Wang (University of Southern California) ; Martin Malmsten ; Louis Martin ; Davide Fiocco ; Clement (@huggingface) ; Ananya Harsh Jha ;

Summary

T5 Model (@patrickvonplaten, @thomwolf )

T5 is a powerful encoder-decoder model that formats every NLP problem into a text-to-text format. It achieves state of the art results on a variety of NLP tasks (Summarization, Question-Answering, ...).

Five sets of pre-trained weights (pre-trained on a multi-task mixture of unsupervised and supervised tasks) are released. In ascending order from 60 million parameters to 11 billion parameters:

t5-small, t5-base, t5-large, t5-3b, t5-11b

T5 can now be used with the translation and summarization pipeline.

paper official code model available in Hugging Face's community models docs

Big thanks to the original authors, especially @craffel who helped answer our questions, reviewed PRs and tested T5 extensively.

New BART checkpoint: bart-large-xsum (@sshleifer)

These weights are from BART finetuned on the XSum abstractive summarization challenge, which encourages shorter (more abstractive) summaries. It achieves state of the art.

BART summarization example with pytorch-lightning (@acarrera94)

New example: BART for summarization, using Pytorch-lightning. Trains on CNN/DM and evaluates.

Translation pipeline (@patrickvonplaten)

A new pipeline is available, leveraging the T5 model. The T5 model was added to the summarization pipeline as well.

Memory improvements with BART (@sshleifer)

In an effort to have the same memory footprint and same computing power necessary to run inference on BART, several improvements have been made on the model:

Remove the LM head and use the embedding matrix instead (~200MB) Call encoder before expanding input_ids (~1GB) SelfAttention only returns weights if config.output_attentions (~500MB) Two separate, smaller decoder attention masks (~500MB) drop columns that are exclusively pad_token_id from input_ids in evaluate_cnn example. New model: XLMForTokenClassification (@sakares)

A new head was added to XLM: XLMForTokenClassification.

More information

DOI: 10.5281/zenodo.3733180

Dates

Publication date: 2020
Issued: March 30, 2020

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v2.7.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries Homepage

Search the MIT Libraries

Title: huggingface/transformers: T5 Model, BART summarization example and reduced memory, translation pipeline

Links

Summary

More information

Dates

Rights

Format

Relateditems