- actions_required
- file:view
- file:create
- file:update
- entity:view
- entity:create
- entity:update
- description
- Extracts hierarchical document structure from any text-based file (txt, md, rst, etc.) using LLMs and creates entities with relationships
- endpoint
- https://structure-extraction.arke.institute
- endpoint_verified_at
- 2026-01-21T00:51:07.079Z
- input_schema
- properties
- file_id
- description
- Source text-based file entity to extract structure from (txt, md, rst, etc.)
- type
- string
- options
- description
- Extraction options
- properties
- custom_prompt
- description
- Additional instructions for the LLM
- type
- string
- max_chunk_tokens
- description
- Maximum tokens per chunk (default: 1024)
- type
- number
- model_name
- description
- LLM model to use (default: meta-llama/Llama-3.3-70B-Instruct-Turbo)
- type
- string
- on_behalf_of
- description
- Entity ID to act on behalf of (for permission context)
- type
- string
- type
- object
- type
- object
- output_description
- Creates a tree of entities representing the document's logical structure. The root entity represents the whole document (e.g. a 'book' or 'article'). Below it are structural divisions — parts, chapters, sections — determined by the LLM based on the document's actual organization. The specific types and depth depend on the document. Leaf-level sections are further split into 'chunk' entities (~1024 tokens each) suitable for embedding and search. Only leaf entities (those with no children) carry a 'text' property containing their content. Container entities do not have text — to read the document, collect text from the leaf entities in order. Every entity has a 'source_file' property pointing back to the input file, and 'start_line'/'end_line' properties indicating which lines of the source file it covers.
- output_relationships
- Each child entity has an 'in' relationship pointing to its parent
- Each entity has an 'extractedFrom' relationship pointing to the source file
- Siblings are linked with 'prev' and 'next' relationships in document order
- Each parent has 'contains' relationships pointing to all its direct children
- To traverse the tree: start from the source file, follow 'contains' to find root entities, then recursively follow 'contains' downward
- output_tree_example
- book 'Pride and Prejudice'
├── intro 'Title Page' (lines 1-5)
├── chapter 'Chapter 1' (lines 6-120)
│ ├── section 'Opening Scene' (lines 6-60)
│ │ ├── chunk 1 (lines 6-30, has text)
│ │ └── chunk 2 (lines 31-60, has text)
│ └── section 'First Meeting' (lines 61-120)
│ ├── chunk 3 (lines 61-90, has text)
│ └── chunk 4 (lines 91-120, has text)
└── chapter 'Chapter 2' (lines 121-250)
└── ...
- status
- active