• Daniel Ecer's avatar
    added segmentation model training data generator (#472) · 906b2c07
    Daniel Ecer authored
    * added simple segmentation tei generator
    
    * added test_should_add_line_feeds
    
    * add lb element before line feeds
    
    * added get_training_tei_xml_for_model_data_iterable
    
    * replaced get_training_tei_children_for_layout_line
    
    * refactored iter_tei_child_for_model_data_iterable
    
    * refactored get_training_tei_xml_for_layout_document
    
    * added training documentation
    
    * added dummy generate training data cli
    
    * moved to training.cli package
    
    * generate output path
    
    * convert to layout document
    
    * generated segmentation tei xml
    
    * removed model arg
    
    * added get_parsed_layout_document to Parser
    
    * added get_layout_document
    
    * write raw training data
    
    * added support for glob file patterns
    
    * extracted generate_training_data_for_source_filename
    
    * extracted generate_training_data_for_layout_document
    
    * pass xml tag name as positional argument
    
    * extracted _get_training_tei_xml_for_children
    906b2c07
README.md 11.3 KB