Skip to content
Snippets Groups Projects
  1. May 18, 2021
    • Daniel Ecer's avatar
      create vocabulary (#349) · e3ec9802
      Daniel Ecer authored
      * initial create vocabulary utility
      
      * extract vocabulary from embeddings
      
      * renamed to --output-word-count-file
      
      * added main call
      
      * extracted iter_tokenized_tokens
      
      * avoid empty tokens
      
      * using tokenizer from delft
      
      * optionally sort by count
      
      * added file list support
      
      * added support for remote files
      
      * added limit argument
      
      * added fsspec dependency
      
      * optionally use multi threading or processing
      
      * included full github link
      
      * renamed to create_vocabulary
      
      * moved to tools vocabulary
      
      * filter embeddings
      
      * renamed to embeddings
      
      * using fsspec to open embeddings file when extracting
      
      * use fsspec when filtering embeddings
      
      * document tools
      
      * added link to tools.md
  2. May 13, 2021
  3. Jan 12, 2021
  4. Sep 09, 2019
  5. Jun 05, 2019
    • Daniel Ecer's avatar
      removed cython dependency (#110) · eb23da3f
      Daniel Ecer authored
    • Daniel Ecer's avatar
      added autocut model (#106) · 85754f2d
      Daniel Ecer authored
      * added dev-venv target
      
      * added subextract model
      
      * added nltk dependency
      
      * flake8 ignore line break before binary operator
      
      * moved dev dependencies up
      
      * added nltk punkt download
      
      * added nltk download to dev-venv; pytest and pytest-not-slow target
      
      * added subextract training pipeline
      
      * added optional xpath namespaces
      
      * log failed xml file
      
      * use recover parser option
      
      * added subextract app
      
      * start subextract server
      
      * renamed to autocut
      
      * declare slow and very_slow pytest markers
      
      * make autocut main test as slow
      
      * fixed post data
      
      * updated README
      
      * also build non-dev image as part of ci
      
      * added pytest.ini to dev image
  6. Aug 24, 2018
  7. Jul 06, 2018
  8. Apr 19, 2018
    • Peter Hooper's avatar
      Fix versioning for google-cloud packages. (#28) · 851eb577
      Peter Hooper authored
      * Renamed files to be consistent with 'sciencebeam' repo
      
      * Update Dockerfile and README after filename change
      
      * Update version of tensorflow-transform to 0.6
      
      * Pin versions of oauth2client and httplib2 to prevent errors with google-cloud. Update apache_beam to 2.4 as used by new version of tensorflow-transform
  9. Mar 20, 2018
  10. Jan 10, 2018
  11. Jan 09, 2018
  12. Dec 19, 2017
  13. Dec 18, 2017
  14. Dec 07, 2017
  15. Dec 05, 2017
  16. Dec 04, 2017
  17. Nov 28, 2017
  18. Aug 22, 2017
  19. Aug 04, 2017
  20. Aug 01, 2017
  21. Jul 24, 2017
  22. Jul 21, 2017
  23. Jul 19, 2017