Skip to content
Snippets Groups Projects
  1. Aug 03, 2021
  2. Aug 02, 2021
  3. Jul 26, 2021
  4. Jul 19, 2021
  5. Jul 14, 2021
  6. Jul 09, 2021
  7. Jul 07, 2021
  8. Jun 28, 2021
  9. Jun 23, 2021
  10. Jun 14, 2021
  11. Jun 09, 2021
  12. Jun 01, 2021
  13. May 25, 2021
  14. May 24, 2021
  15. May 18, 2021
    • Daniel Ecer's avatar
      create vocabulary (#349) · e3ec9802
      Daniel Ecer authored
      * initial create vocabulary utility
      
      * extract vocabulary from embeddings
      
      * renamed to --output-word-count-file
      
      * added main call
      
      * extracted iter_tokenized_tokens
      
      * avoid empty tokens
      
      * using tokenizer from delft
      
      * optionally sort by count
      
      * added file list support
      
      * added support for remote files
      
      * added limit argument
      
      * added fsspec dependency
      
      * optionally use multi threading or processing
      
      * included full github link
      
      * renamed to create_vocabulary
      
      * moved to tools vocabulary
      
      * filter embeddings
      
      * renamed to embeddings
      
      * using fsspec to open embeddings file when extracting
      
      * use fsspec when filtering embeddings
      
      * document tools
      
      * added link to tools.md
  16. May 13, 2021
  17. May 12, 2021
  18. May 06, 2021
  19. May 05, 2021
  20. May 03, 2021
  21. Apr 26, 2021
  22. Apr 22, 2021
  23. Apr 14, 2021
  24. Apr 08, 2021
  25. Apr 06, 2021
  26. Apr 05, 2021
  27. Mar 22, 2021
  28. Mar 11, 2021
  29. Mar 08, 2021