Skip to content
Snippets Groups Projects
  1. Jun 23, 2021
  2. Jun 14, 2021
  3. Jun 09, 2021
  4. Jun 01, 2021
  5. May 25, 2021
  6. May 24, 2021
  7. May 18, 2021
    • Daniel Ecer's avatar
      create vocabulary (#349) · e3ec9802
      Daniel Ecer authored
      * initial create vocabulary utility
      
      * extract vocabulary from embeddings
      
      * renamed to --output-word-count-file
      
      * added main call
      
      * extracted iter_tokenized_tokens
      
      * avoid empty tokens
      
      * using tokenizer from delft
      
      * optionally sort by count
      
      * added file list support
      
      * added support for remote files
      
      * added limit argument
      
      * added fsspec dependency
      
      * optionally use multi threading or processing
      
      * included full github link
      
      * renamed to create_vocabulary
      
      * moved to tools vocabulary
      
      * filter embeddings
      
      * renamed to embeddings
      
      * using fsspec to open embeddings file when extracting
      
      * use fsspec when filtering embeddings
      
      * document tools
      
      * added link to tools.md
  8. May 13, 2021
  9. May 12, 2021
  10. May 06, 2021
  11. May 05, 2021
  12. May 03, 2021
  13. Apr 26, 2021
  14. Apr 22, 2021
  15. Apr 14, 2021
  16. Apr 08, 2021
  17. Apr 06, 2021
  18. Apr 05, 2021
  19. Mar 22, 2021
  20. Mar 11, 2021
  21. Mar 08, 2021
  22. Feb 26, 2021
  23. Feb 23, 2021
  24. Feb 22, 2021
  25. Feb 18, 2021
  26. Feb 11, 2021
  27. Feb 10, 2021
  28. Feb 01, 2021
  29. Jan 26, 2021