agent

Image Workflow

01KFFGQJC874G1S3SCSWT60W5Y

Properties

actions_required
  • file:view
  • file:create
  • file:update
  • file:download
  • entity:view
  • entity:update
  • relationship:create
description
Processes JPEG images: resize, OCR, assemble text, extract structure, describe derivative images
endpoint
https://image-workflow.arke.institute
endpoint_verified_at
2026-01-21T05:34:14.846Z
input_schema
properties
file_ids
description
Ordered array of JPEG file entity IDs to process
items
type
string
type
array
options
properties
custom_prompt
description
Custom prompt to pass to downstream services (text-workflow, image-description)
type
string
label
description
Label for the combined text file
type
string
skip_image_descriptions
description
Skip derivative image descriptions
type
boolean
skip_text_workflow
description
Skip structure extraction and description (assembly only)
type
boolean
type
object
required
  • file_ids
type
object
output_description
After the workflow completes, every input JPEG has been fully processed into a rich, interconnected entity graph. Each source JPEG receives up to three resized derivative files (large at 2400px, medium at 1288px, thumbnail at 256px), OCR-extracted text written onto its 'text' property, and metadata like 'text_source', 'text_has_content', and 'text_images_count'. If OCR detects embedded images (figures, charts, diagrams) within a page, those are extracted as new JPEG file entities with their own binary content. All per-page OCR text is then assembled into a single combined text file entity that references every source page. The combined text is passed through structure extraction, which creates a hierarchical tree of entities representing the document's logical organization — a root entity (e.g. 'book' or 'report'), intermediate structural divisions (parts, chapters, sections), and leaf-level chunk entities (~1024 tokens each) that carry the actual text content. Every structural entity receives a generated description with overview, context, and contents sections. Finally, any images that were extracted from pages during OCR receive their own contextual descriptions and updated labels from the image description service. The end result is a fully searchable, browsable document graph: resized images for display, structured text for navigation and retrieval, and rich descriptions on every meaningful entity.
output_relationships
  • source JPEG --[has_derivative]--> resized JPEG: each input image links to its large, medium, and thumbnail versions
  • resized JPEG --[derived_from]--> source JPEG: each resized version links back to the original it was created from
  • source JPEG --[has_extracted]--> extracted image: pages with embedded figures link to the images pulled out during OCR
  • extracted image --[extracted_from]--> source JPEG: each extracted image links back to the page it came from
  • combined text --[assembled_from]--> source JPEG: the assembled text file links to every page whose OCR text it contains
  • source JPEG --[has_assembly]--> combined text: each source page links forward to the combined text file it contributed to
  • structural entity --[extractedFrom]--> combined text: every entity in the structure tree links back to the source text file
  • parent entity --[contains]--> child entity: structural parents link down to their children (book to chapters, chapters to sections, etc.)
  • child entity --[in]--> parent entity: structural children link up to their parent
  • sibling entity --[prev/next]--> sibling entity: structural siblings are linked in document order
output_tree_example
Collection (target) ├── page-001.jpeg (input, updated with OCR text + metadata) │ ├── [has_derivative] ──► page-001_large.jpeg │ ├── [has_derivative] ──► page-001_medium.jpeg │ ├── [has_derivative] ──► page-001_thumbnail.jpeg │ └── [has_extracted] ──► page-001_img-0.jpeg (figure extracted by OCR, described) │ ├── page-002.jpeg (input, updated with OCR text + metadata) │ ├── [has_derivative] ──► page-002_large.jpeg │ ├── [has_derivative] ──► page-002_medium.jpeg │ └── [has_derivative] ──► page-002_thumbnail.jpeg │ ├── combined-text.txt (assembled from all pages) │ ├── [assembled_from] ──► page-001.jpeg │ └── [assembled_from] ──► page-002.jpeg │ └── Structure tree (extracted from combined-text.txt): └── book 'Document Title' (root entity, has description) ├── chapter 'Chapter 1' (has description) │ ├── section 'Introduction' (has description) │ │ ├── chunk 1 (has text, lines 1-30) │ │ └── chunk 2 (has text, lines 31-55) │ └── section 'Background' (has description) │ └── chunk 3 (has text, lines 56-90) └── chapter 'Chapter 2' (has description) └── ...
status
active
uses_agents
Image Workflow | Arke