agent

Image Workflow

01KFFGQJC874G1S3SCSWT60W5Y

Properties

actions_required

file:view
file:create
file:update
file:download
entity:view
entity:update
relationship:create

description

Processes JPEG images: resize, OCR, assemble text, extract structure, describe derivative images

endpoint

https://image-workflow.arke.institute

endpoint_verified_at

2026-01-21T05:34:14.846Z

input_schema

properties

file_ids

description

Ordered array of JPEG file entity IDs to process

items

type: string

type

array

options

properties

custom_prompt

description: Custom prompt to pass to downstream services (text-workflow, image-description)
type: string

label

description: Label for the combined text file
type: string

skip_image_descriptions

description: Skip derivative image descriptions
type: boolean

skip_text_workflow

description: Skip structure extraction and description (assembly only)
type: boolean

type

object

required

file_ids

type

object

output_description

After the workflow completes, every input JPEG has been fully processed into a rich, interconnected entity graph. Each source JPEG receives up to three resized derivative files (large at 2400px, medium at 1288px, thumbnail at 256px), OCR-extracted text written onto its 'text' property, and metadata like 'text_source', 'text_has_content', and 'text_images_count'. If OCR detects embedded images (figures, charts, diagrams) within a page, those are extracted as new JPEG file entities with their own binary content. All per-page OCR text is then assembled into a single combined text file entity that references every source page. The combined text is passed through structure extraction, which creates a hierarchical tree of entities representing the document's logical organization — a root entity (e.g. 'book' or 'report'), intermediate structural divisions (parts, chapters, sections), and leaf-level chunk entities (~1024 tokens each) that carry the actual text content. Every structural entity receives a generated description with overview, context, and contents sections. Finally, any images that were extracted from pages during OCR receive their own contextual descriptions and updated labels from the image description service. The end result is a fully searchable, browsable document graph: resized images for display, structured text for navigation and retrieval, and rich descriptions on every meaningful entity.

output_relationships

source JPEG --[has_derivative]--> resized JPEG: each input image links to its large, medium, and thumbnail versions
resized JPEG --[derived_from]--> source JPEG: each resized version links back to the original it was created from
source JPEG --[has_extracted]--> extracted image: pages with embedded figures link to the images pulled out during OCR
extracted image --[extracted_from]--> source JPEG: each extracted image links back to the page it came from
combined text --[assembled_from]--> source JPEG: the assembled text file links to every page whose OCR text it contains
source JPEG --[has_assembly]--> combined text: each source page links forward to the combined text file it contributed to
structural entity --[extractedFrom]--> combined text: every entity in the structure tree links back to the source text file
parent entity --[contains]--> child entity: structural parents link down to their children (book to chapters, chapters to sections, etc.)
child entity --[in]--> parent entity: structural children link up to their parent
sibling entity --[prev/next]--> sibling entity: structural siblings are linked in document order

output_tree_example

Collection (target) ├── page-001.jpeg (input, updated with OCR text + metadata) │ ├── [has_derivative] ──► page-001_large.jpeg │ ├── [has_derivative] ──► page-001_medium.jpeg │ ├── [has_derivative] ──► page-001_thumbnail.jpeg │ └── [has_extracted] ──► page-001_img-0.jpeg (figure extracted by OCR, described) │ ├── page-002.jpeg (input, updated with OCR text + metadata) │ ├── [has_derivative] ──► page-002_large.jpeg │ ├── [has_derivative] ──► page-002_medium.jpeg │ └── [has_derivative] ──► page-002_thumbnail.jpeg │ ├── combined-text.txt (assembled from all pages) │ ├── [assembled_from] ──► page-001.jpeg │ └── [assembled_from] ──► page-002.jpeg │ └── Structure tree (extracted from combined-text.txt): └── book 'Document Title' (root entity, has description) ├── chapter 'Chapter 1' (has description) │ ├── section 'Introduction' (has description) │ │ ├── chunk 1 (has text, lines 1-30) │ │ └── chunk 2 (has text, lines 31-55) │ └── section 'Background' (has description) │ └── chunk 3 (has text, lines 56-90) └── chapter 'Chapter 2' (has description) └── ...

status

active

uses_agents

label
Image Processor
pi
01KFFC66JTGWW67M3D5ZP8YXEV
label
OCR Service
pi
01KFFC4ZBD52SY7E4BX6XV8623
label
Text Assembler
pi
01KFFC4A8W8939TXGEXCK439ZK
label
Text Workflow
pi
01KFFDRKYYW7M9ZYFZEK4HGBHQ
label
Image Description
pi
01KFFC3GD2N54ZRYWMDT8XWB0M

Metadata

Version: 6
Created: 1/21/2026
Updated: 1/30/2026
Edited by: ARCHON