Title: huggingface/transformers: Patch v3.0.1: Better backward compatibility for tokenizers
Type Software Thomas Wolf, Lysandre Debut, Julien Chaumond, Patrick von Platen, Victor SANH, Sam Shleifer, Funtowicz Morgan, Aymeric Augustin, Rémi Louf, Manuel Romero, Sylvain Gugger, Stefan Schweter, Denis, erenup, Matt, Grégory Châtel, Piero Molino, Bram Vanroy, Anthony MOI, Suraj Patil, Gunnlaugur Thor Briem, Tim Rault, Bilal Khan, Catalin Voss, Malte Pietsch, Julien Plu, Lorenzo Ampil, Davide Fiocco, Louis Martin, Fei Wang (2020): huggingface/transformers: Patch v3.0.1: Better backward compatibility for tokenizers. Zenodo. Software. https://zenodo.org/record/3929645
Links
- Item record in Zenodo
- Digital object URL
Summary
Better backward-compatibility for tokenizers following v3.0.0 refactoring
Version v3.0.0, included a refactoring of the tokenizers' backend to allow a simpler and more flexible user-facing API.
This refactoring was conducted with a particular focus on keeping backward compatibility for the v2.X encoding, truncation and padding API but still led to two breaking changes that could have been avoided.
This patch aims to bring back better backward compatibility, by implementing the following updates:
the prepare_for_model method is now publicly exposed again for both slow and fast tokenizers with an API compatible with both the v2.X truncation/padding API and the v3.0 recommended API. the truncation strategy now defaults again to longest_first instead of first_only. Bug fixes and improvements: Better support for TransfoXL tokenizer when using TextGenerationPipeline https://github.com/huggingface/transformers/pull/5465 (@TevenLeScao) Fix use of meme Transformer-XL generations https://github.com/huggingface/transformers/pull/4826 (@tommccoy1) Fixing a bug in the NER pipeline which lead to discarding the last identified entity https://github.com/huggingface/transformers/pull/5439 (@mfuntowicz and @enzoampil) Better QAPipelines https://github.com/huggingface/transformers/pull/5429 (@mfuntowicz) Add Question-Answering and MLM heads to the Reformer model https://github.com/huggingface/transformers/pull/5433 (@patrickvonplaten) Refactoring the LongFormer https://github.com/huggingface/transformers/pull/5219 (@patrickvonplaten) Various fixes on tokenizers and tests (@sshleifer) Many improvements to the doc and tutorials (@sgugger) Fix TensorFlow dataset generator in run_glue https://github.com/huggingface/transformers/pull/4881 (@jplu) Update Bertabs example to work again https://github.com/huggingface/transformers/pull/5355 (@MichaelJanz) Move GenerationMixin to separate file https://github.com/huggingface/transformers/pull/5254 (@yjernite)More information
- DOI: 10.5281/zenodo.3929645
Dates
- Publication date: 2020
- Issued: July 03, 2020
Rights
- info:eu-repo/semantics/openAccess Open Access
Format
electronic resource
Relateditems
Description | Item type | Relationship | Uri |
---|---|---|---|
IsSupplementTo | https://github.com/huggingface/transformers/tree/v3.0.1 | ||
IsVersionOf | https://doi.org/10.5281/zenodo.3385997 | ||
IsPartOf | https://zenodo.org/communities/zenodo |