Title: huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements
Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Victor SANH, Patrick von Platen, Aymeric Augustin, Rémi Louf, Funtowicz Morgan, Stefan Schweter, Denis, Sam Shleifer, erenup, Manuel Romero, Matt, Piero Molino, Grégory Châtel, Bram Vanroy, Tim Rault, Gunnlaugur Thor Briem, Julien Plu, Anthony MOI, Malte Pietsch, Catalin Voss, Bilal Khan, Fei Wang, Martin Malmsten, Louis Martin, Davide Fiocco, Clement, Ananya Harsh Jha (2020): huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements. Zenodo. Software. https://zenodo.org/record/3741842
Links
- Item record in Zenodo
- Digital object URL
Summary
ELECTRA Model (@LysandreJik)
ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.
This release comes with 6 ELECTRA checkpoints:
google/electra-small-discriminator google/electra-small-generator google/electra-base-discriminator google/electra-base-generator google/electra-large-discriminator google/electra-large-generatorRelated:
Paper Official code Models available in the community models DocsThanks to the author @clarkkev for his help during the implementation.
Bad word filters in generate (@patrickvonplaten)The generate method now has a bad word filter.
Fixes and improvements Decoder input ids are not necessary for T5 training anymore (@patrickvonplaten) Update encoder and decoder on set_input_embedding for BART (@sshleifer) Using loaded checkpoint with --do_predict (instead of random init) for Pytorch-lightning scripts (@ethanjperez) Clean summarization and translation example testing files for T5 and Bart (@patrickvonplaten) Cleaner examples (@julien-c) Extensive testing for T5 model (@patrickvonplaten) Force models outputs to always have batch_size as their first dim (@patrickvonplaten) Fix for continuing training in some scripts (@xeb) Resizing embedding matrix before sending it to the optimizer (@ngarneau) BertJapaneseTokenizer accept options for mecab (@tamuhey) Speed up GELU computation with torch.jit (@mryab) fix argument order of update_mems fn in TF version (@patrickvonplaten, @dmytyar) Split generate test function into beam search, no beam search (@patrickvonplaten)More information
- DOI: 10.5281/zenodo.3741842
Dates
- Publication date: 2020
- Issued: April 06, 2020
Rights
- info:eu-repo/semantics/openAccess Open Access
Format
electronic resource
Relateditems
Description | Item type | Relationship | Uri |
---|---|---|---|
IsSupplementTo | https://github.com/huggingface/transformers/tree/v2.8.0 | ||
IsVersionOf | https://doi.org/10.5281/zenodo.3385997 | ||
IsPartOf | https://zenodo.org/communities/zenodo |