rhiza

PDF to KG Basic

01KJTB0DBA7RNX8NV90W2B1CJZ

Properties

description
End-to-end pipeline for raw PDF files: scatters multiple PDFs, extracts text (digital) or converts to JPEG and OCRs (scanned), then extracts knowledge graph entities. Handles all file processing automatically — use this instead of KG-only workflows for PDF input.
entry
scatter
flow
describe_image
input
update_label
true
then
pass
kg_process
kg_process
then
done
true
ocr
then
route
  • target
    describe_image
    where
    equals
    extracted_image
    property
    entity_class
scatter
kg_process
pdf_convert
then
route
  • target
    kg_process
    where
    equals
    false
    property
    needs_ocr
scatter
ocr
scatter
then
scatter
pdf_convert
status
active
version
3.0