This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements

Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Victor SANH, Patrick von Platen, Aymeric Augustin, Rémi Louf, Funtowicz Morgan, Stefan Schweter, Denis, Sam Shleifer, erenup, Manuel Romero, Matt, Piero Molino, Grégory Châtel, Bram Vanroy, Tim Rault, Gunnlaugur Thor Briem, Julien Plu, Anthony MOI, Malte Pietsch, Catalin Voss, Bilal Khan, Fei Wang, Martin Malmsten, Louis Martin, Davide Fiocco, Clement, Ananya Harsh Jha (2020): huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements. Zenodo. Software. https://zenodo.org/record/3741842

Authors: Thomas Wolf (@huggingface) ; Lysandre Debut (Hugging Face) ; Julien Chaumond (Hugging Face) ; Victor SANH (@huggingface) ; Patrick von Platen ; Aymeric Augustin (@canalplus) ; Rémi Louf ; Funtowicz Morgan (HuggingFace) ; Stefan Schweter ; Denis ; Sam Shleifer (Huggingface) ; erenup ; Manuel Romero ; Matt ; Piero Molino ; Grégory Châtel (DisAItek & Intel AI Innovators) ; Bram Vanroy (@UGent) ; Tim Rault (@huggingface) ; Gunnlaugur Thor Briem (Qlik) ; Julien Plu (Leboncoin Lab) ; Anthony MOI (Hugging Face) ; Malte Pietsch (deepset) ; Catalin Voss (Stanford University) ; Bilal Khan ; Fei Wang (University of Southern California) ; Martin Malmsten ; Louis Martin ; Davide Fiocco ; Clement (@huggingface) ; Ananya Harsh Jha ;

Links

Summary

ELECTRA Model (@LysandreJik)

ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

This release comes with 6 ELECTRA checkpoints:

google/electra-small-discriminator google/electra-small-generator google/electra-base-discriminator google/electra-base-generator google/electra-large-discriminator google/electra-large-generator

Related:

Paper Official code Models available in the community models Docs

Thanks to the author @clarkkev for his help during the implementation.

Bad word filters in generate (@patrickvonplaten)

The generate method now has a bad word filter.

Fixes and improvements Decoder input ids are not necessary for T5 training anymore (@patrickvonplaten) Update encoder and decoder on set_input_embedding for BART (@sshleifer) Using loaded checkpoint with --do_predict (instead of random init) for Pytorch-lightning scripts (@ethanjperez) Clean summarization and translation example testing files for T5 and Bart (@patrickvonplaten) Cleaner examples (@julien-c) Extensive testing for T5 model (@patrickvonplaten) Force models outputs to always have batch_size as their first dim (@patrickvonplaten) Fix for continuing training in some scripts (@xeb) Resizing embedding matrix before sending it to the optimizer (@ngarneau) BertJapaneseTokenizer accept options for mecab (@tamuhey) Speed up GELU computation with torch.jit (@mryab) fix argument order of update_mems fn in TF version (@patrickvonplaten, @dmytyar) Split generate test function into beam search, no beam search (@patrickvonplaten)

More information

  • DOI: 10.5281/zenodo.3741842

Dates

  • Publication date: 2020
  • Issued: April 06, 2020

Rights

  • info:eu-repo/semantics/openAccess Open Access

Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsSupplementTohttps://github.com/huggingface/transformers/tree/v2.8.0
IsVersionOfhttps://doi.org/10.5281/zenodo.3385997
IsPartOfhttps://zenodo.org/communities/zenodo