huggingface/transformers:...

Title: huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements

Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Victor SANH, Patrick von Platen, Aymeric Augustin, Rémi Louf, Funtowicz Morgan, Stefan Schweter, Denis, Sam Shleifer, erenup, Manuel Romero, Matt, Piero Molino, Grégory Châtel, Bram Vanroy, Tim Rault, Gunnlaugur Thor Briem, Julien Plu, Anthony MOI, Malte Pietsch, Catalin Voss, Bilal Khan, Fei Wang, Martin Malmsten, Louis Martin, Davide Fiocco, Clement, Ananya Harsh Jha (2020): huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements. Zenodo. Software. https://zenodo.org/record/3741842

Authors: Thomas Wolf (@huggingface) ; Lysandre Debut (Hugging Face) ; Julien Chaumond (Hugging Face) ; Victor SANH (@huggingface) ; Patrick von Platen ; Aymeric Augustin (@canalplus) ; Rémi Louf ; Funtowicz Morgan (HuggingFace) ; Stefan Schweter ; Denis ; Sam Shleifer (Huggingface) ; erenup ; Manuel Romero ; Matt ; Piero Molino ; Grégory Châtel (DisAItek & Intel AI Innovators) ; Bram Vanroy (@UGent) ; Tim Rault (@huggingface) ; Gunnlaugur Thor Briem (Qlik) ; Julien Plu (Leboncoin Lab) ; Anthony MOI (Hugging Face) ; Malte Pietsch (deepset) ; Catalin Voss (Stanford University) ; Bilal Khan ; Fei Wang (University of Southern California) ; Martin Malmsten ; Louis Martin ; Davide Fiocco ; Clement (@huggingface) ; Ananya Harsh Jha ;

Summary

ELECTRA Model (@LysandreJik)

ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

This release comes with 6 ELECTRA checkpoints:

google/electra-small-discriminator google/electra-small-generator google/electra-base-discriminator google/electra-base-generator google/electra-large-discriminator google/electra-large-generator

Paper Official code Models available in the community models Docs

Thanks to the author @clarkkev for his help during the implementation.

Bad word filters in generate (@patrickvonplaten)

The generate method now has a bad word filter.

Fixes and improvements Decoder input ids are not necessary for T5 training anymore (@patrickvonplaten) Update encoder and decoder on set_input_embedding for BART (@sshleifer) Using loaded checkpoint with --do_predict (instead of random init) for Pytorch-lightning scripts (@ethanjperez) Clean summarization and translation example testing files for T5 and Bart (@patrickvonplaten) Cleaner examples (@julien-c) Extensive testing for T5 model (@patrickvonplaten) Force models outputs to always have batch_size as their first dim (@patrickvonplaten) Fix for continuing training in some scripts (@xeb) Resizing embedding matrix before sending it to the optimizer (@ngarneau) BertJapaneseTokenizer accept options for mecab (@tamuhey) Speed up GELU computation with torch.jit (@mryab) fix argument order of update_mems fn in TF version (@patrickvonplaten, @dmytyar) Split generate test function into beam search, no beam search (@patrickvonplaten)

More information

DOI: 10.5281/zenodo.3741842

Dates

Publication date: 2020
Issued: April 06, 2020

Rights

info:eu-repo/semantics/openAccess Open Access

Format

electronic resource

Relateditems

Description	Item type	Relationship	Uri
		IsSupplementTo	https://github.com/huggingface/transformers/tree/v2.8.0
		IsVersionOf	https://doi.org/10.5281/zenodo.3385997
		IsPartOf	https://zenodo.org/communities/zenodo

This is a limited proof of concept to search for research data, not a production system.

MIT Libraries home

Search the MIT Libraries

Title: huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements

Links

Summary

More information

Dates

Rights

Format

Relateditems