Skip to content
Snippets Groups Projects
  1. Oct 13, 2021
  2. Oct 11, 2021
  3. Sep 20, 2021
  4. Sep 15, 2021
  5. Sep 14, 2021
    • Daniel Ecer's avatar
      added bounding box pipeline (#403) · 603edca9
      Daniel Ecer authored
      * process pdf and xml file lists
      
      * allow sub directories in output path
      
      * make output_annotated_images_path relative to output
      
      * make sure that output directory is created
      
      * use write_bytes in favour of explicit makedirs (cloud ready)
      
      * make --pdf-base-path required
      
      * added pipeline
      
      * not extending ABC due to serialization errors
      
      * log pipeline options
      
      * added test to check serialization
      
      * changed super import to avoid one of the serialization errors in Dataflow
      
      * moved most functionality to separate module
      
      * added libgl1 to setup.py
      
      * added PreventFusion
      
      * minor import grouping
      
      * added TransformAndCount
      
      * reverted super __init__ call
      
      * use parse args, not ignoring unknown args
      
      * expose all of the worker arguments
  6. Sep 09, 2021
  7. Sep 08, 2021
  8. Sep 07, 2021
    • Daniel Ecer's avatar
      check bounding box; fixed image cache key (#400) · f69743c2
      Daniel Ecer authored
      * calculate structural similarity using skimage
      
      * fixed crop_image_to_bounding_box
      
      * output score in JSON
      
      * optionally output images with bounding boxes
      
      * display bbox label inside if not enough space above
      
      * sort by score, then key points
      
      * fixed cache issue by using explicit cache key prefix
      
      (otherwise ids may have been reused after memory being freed)
  9. Sep 03, 2021
    • Daniel Ecer's avatar
      raise error when figure bounding box could not be found (#399) · 46e14b3c
      Daniel Ecer authored
      * raise GraphicImageNotFoundError
      
      * allow skipping errors
    • Daniel Ecer's avatar
      figure image bounding box annotation for single document (#389) · 0bf9e780
      Daniel Ecer authored
      * enable debug logging for tests
      
      * added cli scaffolding
      
      * extract images from pdf
      
      * fixed type hint
      
      * added bounding box to_list
      
      * converted bounding box to named tuple
      
      * added tests for validate
      
      * implemented bounding box intersection
      
      * implemented finding bounding boxes of single image
      
      * added test for smaller partial image
      
      * added libgl1 for open cv
      
      * linting: use with statement for Popen
      
      * added support for multiple image files
      
      * added support for xml files
      
      * join graphic href with xml dirname
      
      * renamed cv2 to cv
      
      * using ObjectDetectorMatcher
      
      * moved funtions to image object matching module
      
      * added TestGetObjectMatch
      
      * added ImageObjectMarchResult
      
      * added test_should_match_smaller_image
      
      * added test_should_match_smaller_rotated_90_image
      
      * fixed typo ImageObjectMatchResult
      
      * moved object_detector_matcher parameter down
      
      * added get_image_list_object_match
      
      * added support for gzipped files
      
      * indent output json
      
      * added annotation file_name
      
      * using sample image as fixture
      
      * reduce size of sample image
      
      * added save_images_as_pdf
      
      * added test using multiple images
      
      * prefer better match based on keyword match count
      
      * added --debug cli arg
      
      * handle case where no homography can be found
      
      * use category based on parent element
      
      * added formula type
      
      * added related_element_id
      
      * added logging tqdm
      
      * added info
      
      * resize larger image to make finding bounding boxes faster
      
      * convert to gray scale to make finding boxes even faster
      
      * made max width height configurable
      
      * cache image features
      
      * increased default max width height
      
      * added tests for get_image_array_with_max_resolution
      
      * allow max width height to be zero
      
      * make grayscale conversion optional
      
      * disabled max resolution by default
  10. Aug 26, 2021
  11. May 18, 2021
    • Daniel Ecer's avatar
      create vocabulary (#349) · e3ec9802
      Daniel Ecer authored
      * initial create vocabulary utility
      
      * extract vocabulary from embeddings
      
      * renamed to --output-word-count-file
      
      * added main call
      
      * extracted iter_tokenized_tokens
      
      * avoid empty tokens
      
      * using tokenizer from delft
      
      * optionally sort by count
      
      * added file list support
      
      * added support for remote files
      
      * added limit argument
      
      * added fsspec dependency
      
      * optionally use multi threading or processing
      
      * included full github link
      
      * renamed to create_vocabulary
      
      * moved to tools vocabulary
      
      * filter embeddings
      
      * renamed to embeddings
      
      * using fsspec to open embeddings file when extracting
      
      * use fsspec when filtering embeddings
      
      * document tools
      
      * added link to tools.md
  12. May 13, 2021
  13. Jan 17, 2020
  14. Sep 09, 2019
    • Daniel Ecer's avatar
      switched to python3 (#145) · b3473e4c
      Daniel Ecer authored
      * ugraded to python 3
      
      * upgrade pylint and pytest
      
      * replaced StandardError
      
      * exclude useless-object-inheritance
      
      * python3 compatibilities uncovered by linting
      
      * fixed tests
      
      * fixed more python3 test incompatibilities
  15. Jun 03, 2019
  16. Nov 02, 2018
    • Daniel Ecer's avatar
      pylint and flake8 checking (#39) · 91e1c0d0
      Daniel Ecer authored
      * added pylint check
      
      * added pylintrc to docker image
      
      * reduced accessive apache beam debug logging
      
      * configured pylint, addressed linting
      
      * enabled flake8 checks
      
      * downgrade pycodestyle to 2.3.1 due to error
      
      * switch to 4 spaces indent
      
      * autopep8
      
      * more flake8
      
      * added new line to .flake8
  17. Aug 24, 2018
    • Daniel Ecer's avatar
      Use sciencebeam-utils and sciencebeam-alignment (#35) · bd100f4d
      Daniel Ecer authored
      * added sciencebeam-utils and sciencebeam-alignment dependency
      
      * replaced local alignment module with sciencebeam-alignment
      
      * replaced local beam_utils with sciencebeam-utils
      
      * replaced local utils with sciencebeam-utils for modules that have moved
      
      * removed tools that have been moved to sciencebeam-utils
      
      * removed utility functions that have moved to sciencebeam-utils
      
      * updated readme
  18. Jan 29, 2018
  19. Dec 19, 2017
  20. Dec 08, 2017
  21. Dec 07, 2017
  22. Dec 06, 2017
  23. Nov 29, 2017
  24. Aug 04, 2017
  25. Jul 19, 2017