- actions_required
- file:view
- file:create
- file:update
- file:download
- entity:view
- entity:update
- relationship:create
- description
- Processes JPEG images: resize, OCR, assemble text, extract structure, describe derivative images
- endpoint
- https://image-workflow.arke.institute
- endpoint_verified_at
- 2026-01-21T05:34:14.846Z
- input_schema
- properties
- file_ids
- description
- Ordered array of JPEG file entity IDs to process
- type
- array
- options
- properties
- custom_prompt
- description
- Custom prompt to pass to downstream services (text-workflow, image-description)
- type
- string
- label
- description
- Label for the combined text file
- type
- string
- skip_image_descriptions
- description
- Skip derivative image descriptions
- type
- boolean
- skip_text_workflow
- description
- Skip structure extraction and description (assembly only)
- type
- boolean
- type
- object
- type
- object
- output_description
- After the workflow completes, every input JPEG has been fully processed into a rich, interconnected entity graph. Each source JPEG receives up to three resized derivative files (large at 2400px, medium at 1288px, thumbnail at 256px), OCR-extracted text written onto its 'text' property, and metadata like 'text_source', 'text_has_content', and 'text_images_count'. If OCR detects embedded images (figures, charts, diagrams) within a page, those are extracted as new JPEG file entities with their own binary content. All per-page OCR text is then assembled into a single combined text file entity that references every source page. The combined text is passed through structure extraction, which creates a hierarchical tree of entities representing the document's logical organization — a root entity (e.g. 'book' or 'report'), intermediate structural divisions (parts, chapters, sections), and leaf-level chunk entities (~1024 tokens each) that carry the actual text content. Every structural entity receives a generated description with overview, context, and contents sections. Finally, any images that were extracted from pages during OCR receive their own contextual descriptions and updated labels from the image description service. The end result is a fully searchable, browsable document graph: resized images for display, structured text for navigation and retrieval, and rich descriptions on every meaningful entity.
- output_relationships
- source JPEG --[has_derivative]--> resized JPEG: each input image links to its large, medium, and thumbnail versions
- resized JPEG --[derived_from]--> source JPEG: each resized version links back to the original it was created from
- source JPEG --[has_extracted]--> extracted image: pages with embedded figures link to the images pulled out during OCR
- extracted image --[extracted_from]--> source JPEG: each extracted image links back to the page it came from
- combined text --[assembled_from]--> source JPEG: the assembled text file links to every page whose OCR text it contains
- source JPEG --[has_assembly]--> combined text: each source page links forward to the combined text file it contributed to
- structural entity --[extractedFrom]--> combined text: every entity in the structure tree links back to the source text file
- parent entity --[contains]--> child entity: structural parents link down to their children (book to chapters, chapters to sections, etc.)
- child entity --[in]--> parent entity: structural children link up to their parent
- sibling entity --[prev/next]--> sibling entity: structural siblings are linked in document order
- output_tree_example
- Collection (target)
├── page-001.jpeg (input, updated with OCR text + metadata)
│ ├── [has_derivative] ──► page-001_large.jpeg
│ ├── [has_derivative] ──► page-001_medium.jpeg
│ ├── [has_derivative] ──► page-001_thumbnail.jpeg
│ └── [has_extracted] ──► page-001_img-0.jpeg (figure extracted by OCR, described)
│
├── page-002.jpeg (input, updated with OCR text + metadata)
│ ├── [has_derivative] ──► page-002_large.jpeg
│ ├── [has_derivative] ──► page-002_medium.jpeg
│ └── [has_derivative] ──► page-002_thumbnail.jpeg
│
├── combined-text.txt (assembled from all pages)
│ ├── [assembled_from] ──► page-001.jpeg
│ └── [assembled_from] ──► page-002.jpeg
│
└── Structure tree (extracted from combined-text.txt):
└── book 'Document Title' (root entity, has description)
├── chapter 'Chapter 1' (has description)
│ ├── section 'Introduction' (has description)
│ │ ├── chunk 1 (has text, lines 1-30)
│ │ └── chunk 2 (has text, lines 31-55)
│ └── section 'Background' (has description)
│ └── chunk 3 (has text, lines 56-90)
└── chapter 'Chapter 2' (has description)
└── ...
- status
- active
- uses_agents
- label
- Image Processor
- label
- OCR Service
- label
- Text Assembler
- label
- Text Workflow
- label
- Image Description