This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: huggingface/transformers: Patch v3.0.1: Better backward compatibility for tokenizers

Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Patrick von Platen, Victor SANH, Sam Shleifer, Funtowicz Morgan, Aymeric Augustin, Rémi Louf, Manuel Romero, Sylvain Gugger, Stefan Schweter, Denis, erenup, Matt, Grégory Châtel, Piero Molino, Bram Vanroy, Anthony MOI, Suraj Patil, Gunnlaugur Thor Briem, Tim Rault, Bilal Khan, Catalin Voss, Malte Pietsch, Julien Plu, Lorenzo Ampil, Davide Fiocco, Louis Martin, Fei Wang (2020): huggingface/transformers: Patch v3.0.1: Better backward compatibility for tokenizers. Zenodo. Software. https://zenodo.org/record/3929645

Authors: Thomas Wolf (@huggingface) ; Lysandre Debut (Hugging Face) ; Julien Chaumond (Hugging Face) ; Patrick von Platen ; Victor SANH (@huggingface) ; Sam Shleifer (Huggingface) ; Funtowicz Morgan (HuggingFace) ; Aymeric Augustin (@qonto) ; Rémi Louf ; Manuel Romero ; Sylvain Gugger ; Stefan Schweter ; Denis ; erenup ; Matt ; Grégory Châtel (DisAItek & Intel AI Innovators) ; Piero Molino ; Bram Vanroy (@UGent) ; Anthony MOI (Hugging Face) ; Suraj Patil (Wynum) ; Gunnlaugur Thor Briem (Qlik) ; Tim Rault (@huggingface) ; Bilal Khan ; Catalin Voss (Stanford University) ; Malte Pietsch (deepset) ; Julien Plu (Leboncoin Lab) ; Lorenzo Ampil (@thinkingmachines) ; Davide Fiocco (@frontiersin) ; Louis Martin ; Fei Wang (University of Southern California) ;

Links

Summary

Better backward-compatibility for tokenizers following v3.0.0 refactoring

Version v3.0.0, included a refactoring of the tokenizers' backend to allow a simpler and more flexible user-facing API.

This refactoring was conducted with a particular focus on keeping backward compatibility for the v2.X encoding, truncation and padding API but still led to two breaking changes that could have been avoided.

This patch aims to bring back better backward compatibility, by implementing the following updates:

the prepare_for_model method is now publicly exposed again for both slow and fast tokenizers with an API compatible with both the v2.X truncation/padding API and the v3.0 recommended API. the truncation strategy now defaults again to longest_first instead of first_only. Bug fixes and improvements: Better support for TransfoXL tokenizer when using TextGenerationPipeline https://github.com/huggingface/transformers/pull/5465 (@TevenLeScao) Fix use of meme Transformer-XL generations https://github.com/huggingface/transformers/pull/4826 (@tommccoy1) Fixing a bug in the NER pipeline which lead to discarding the last identified entity https://github.com/huggingface/transformers/pull/5439 (@mfuntowicz and @enzoampil) Better QAPipelines https://github.com/huggingface/transformers/pull/5429 (@mfuntowicz) Add Question-Answering and MLM heads to the Reformer model https://github.com/huggingface/transformers/pull/5433 (@patrickvonplaten) Refactoring the LongFormer https://github.com/huggingface/transformers/pull/5219 (@patrickvonplaten) Various fixes on tokenizers and tests (@sshleifer) Many improvements to the doc and tutorials (@sgugger) Fix TensorFlow dataset generator in run_glue https://github.com/huggingface/transformers/pull/4881 (@jplu) Update Bertabs example to work again https://github.com/huggingface/transformers/pull/5355 (@MichaelJanz) Move GenerationMixin to separate file https://github.com/huggingface/transformers/pull/5254 (@yjernite)

More information

  • DOI: 10.5281/zenodo.3929645

Dates

  • Publication date: 2020
  • Issued: July 03, 2020

Rights

  • info:eu-repo/semantics/openAccess Open Access

Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsSupplementTohttps://github.com/huggingface/transformers/tree/v3.0.1
IsVersionOfhttps://doi.org/10.5281/zenodo.3385997
IsPartOfhttps://zenodo.org/communities/zenodo