Skip to content
Snippets Groups Projects
  1. May 24, 2021
  2. May 18, 2021
    • Daniel Ecer's avatar
      create vocabulary (#349) · e3ec9802
      Daniel Ecer authored
      * initial create vocabulary utility
      
      * extract vocabulary from embeddings
      
      * renamed to --output-word-count-file
      
      * added main call
      
      * extracted iter_tokenized_tokens
      
      * avoid empty tokens
      
      * using tokenizer from delft
      
      * optionally sort by count
      
      * added file list support
      
      * added support for remote files
      
      * added limit argument
      
      * added fsspec dependency
      
      * optionally use multi threading or processing
      
      * included full github link
      
      * renamed to create_vocabulary
      
      * moved to tools vocabulary
      
      * filter embeddings
      
      * renamed to embeddings
      
      * using fsspec to open embeddings file when extracting
      
      * use fsspec when filtering embeddings
      
      * document tools
      
      * added link to tools.md
  3. May 13, 2021
  4. May 12, 2021
  5. May 06, 2021
  6. May 05, 2021
  7. May 03, 2021
  8. Apr 26, 2021
  9. Apr 22, 2021
  10. Apr 14, 2021
  11. Apr 08, 2021
  12. Apr 06, 2021
  13. Apr 05, 2021
  14. Mar 22, 2021
  15. Mar 11, 2021
  16. Mar 08, 2021
  17. Feb 26, 2021
  18. Feb 23, 2021
  19. Feb 22, 2021
  20. Feb 18, 2021
  21. Feb 11, 2021
  22. Feb 10, 2021
  23. Feb 01, 2021
  24. Jan 26, 2021
  25. Jan 25, 2021
  26. Jan 21, 2021
  27. Jan 19, 2021
  28. Jan 18, 2021
  29. Jan 12, 2021