agent

Structure Extraction

01KFF0H3YRP9ZSM033AM0QJ47H

Properties

actions_required

file:view
file:create
file:update
entity:view
entity:create
entity:update

description

Extracts hierarchical document structure from any text-based file (txt, md, rst, etc.) using LLMs and creates entities with relationships

endpoint

https://structure-extraction.arke.institute

endpoint_verified_at

2026-01-21T00:51:07.079Z

input_schema

properties

file_id

description: Source text-based file entity to extract structure from (txt, md, rst, etc.)
type: string

options

description

Extraction options

properties

custom_prompt

description: Additional instructions for the LLM
type: string

max_chunk_tokens

description: Maximum tokens per chunk (default: 1024)
type: number

model_name

description: LLM model to use (default: meta-llama/Llama-3.3-70B-Instruct-Turbo)
type: string

on_behalf_of

description: Entity ID to act on behalf of (for permission context)
type: string

type

object

required

file_id

type

object

output_description

Creates a tree of entities representing the document's logical structure. The root entity represents the whole document (e.g. a 'book' or 'article'). Below it are structural divisions — parts, chapters, sections — determined by the LLM based on the document's actual organization. The specific types and depth depend on the document. Leaf-level sections are further split into 'chunk' entities (~1024 tokens each) suitable for embedding and search. Only leaf entities (those with no children) carry a 'text' property containing their content. Container entities do not have text — to read the document, collect text from the leaf entities in order. Every entity has a 'source_file' property pointing back to the input file, and 'start_line'/'end_line' properties indicating which lines of the source file it covers.

output_relationships

Each child entity has an 'in' relationship pointing to its parent
Each entity has an 'extractedFrom' relationship pointing to the source file
Siblings are linked with 'prev' and 'next' relationships in document order
Each parent has 'contains' relationships pointing to all its direct children
To traverse the tree: start from the source file, follow 'contains' to find root entities, then recursively follow 'contains' downward

output_tree_example

book 'Pride and Prejudice' ├── intro 'Title Page' (lines 1-5) ├── chapter 'Chapter 1' (lines 6-120) │ ├── section 'Opening Scene' (lines 6-60) │ │ ├── chunk 1 (lines 6-30, has text) │ │ └── chunk 2 (lines 31-60, has text) │ └── section 'First Meeting' (lines 61-120) │ ├── chunk 3 (lines 61-90, has text) │ └── chunk 4 (lines 91-120, has text) └── chapter 'Chapter 2' (lines 121-250) └── ...

status

active

Metadata

Version: 7
Created: 1/21/2026
Updated: 1/30/2026
Edited by: ARCHON