Discourse X-Ray: Making Invisible Writing Structure Visible
Discourse X-Ray exposes the argumentative skeleton of a draft using Rhetorical Structure Theory. An LLM-powered tool for writers and students to see and fix their reasoning shape.
Key Takeaways
- Most writing tools score grammar and tone. Discourse X-Ray scores the underlying structure of an argument.
- It uses Rhetorical Structure Theory: text becomes a graph of relations (evidence, contrast, concession), not just paragraphs.
- The architecture pattern: LLM does the semantic labeling, deterministic code handles segmentation, validation, caching, and metrics.
Most writing tools obsess over the surface — grammar, tone, word choice. Useful, but shallow. If you have ever reread a draft that sounds polished yet still feels weak, you already know the real problem lives a layer deeper: structure.
Every piece of writing has an argumentative skeleton, whether the author sees it or not:
- What is the main claim?
- What supports it?
- Where does it concede, contrast, or pivot?
- Which sentences are floating with nothing holding them up?
Discourse X-Ray exposes that skeleton. It is built for writers and students who are still building the muscle of making reasoning explicit. It will not write your essay. It will show you the shape of the one you already wrote — clearly enough that you can fix it yourself.
What is Rhetorical Structure Theory?
Rhetorical Structure Theory (RST) is one of those frameworks that clicks instantly once you see it: writing is not just paragraphs, it is a graph of relations — evidence, cause, contrast, concession, elaboration. The problem is that mapping structure by hand is slow, and students almost never get that kind of feedback at scale.
So the question that started this project:
What if you could get a structural X-ray of your draft in seconds — something you can explore, critique, and revise against?
The product brief fell out of that:
- Not grade the essay.
- Not rewrite the essay.
- Show me my reasoning shape.
The workflow is four steps:
- Paste or upload a document.
- The backend segments text into clause-like units (EDUs).
- An LLM labels the rhetorical relationships between them.
- The frontend renders an interactive tree, plus metrics and recommendations.
The architectural principle underneath: wrap a probabilistic model in deterministic scaffolding so the output stays inspectable and repeatable.
What does Discourse X-Ray show you?
A navigable rhetorical tree
A zoomable D3 tree where edges are colored by relation type, nodes are marked by nuclearity, and clicking any node opens an inspector. You can literally see a paragraph's claims laddering up to a thesis — or failing to.
- Student insight: "Oh — I'm just stacking assertions."
- Writer insight: "My evidence gets thin in the middle section."
Metrics that turn structure into signal
Visualization is good for intuition. Metrics are better for comparison across drafts:
- Claim-to-evidence ratio
- Relation distribution (are you all assertion, no concession?)
- Coherence distance (how far supports sit from the claims they serve)
- Orphan claims (statements connecting to nothing)
Recommendations tied back to the text
Each recommendation is severity-tagged (ACTION / WARNING / TIP) and jumps you to the exact span or node. Less "AI critique," more structural debugging.
History and shareable links
Writing is iterative. Analyses persist, runs are comparable, and you can send a link to a tutor or peer to discuss structure instead of vibes.
Why does each piece of the stack earn its place?
Frontend — React + TypeScript + Vite + D3
- React fits a workspace UI (tree + panels + history + upload).
- TypeScript keeps the data contract honest: nodes, edges, relations, spans, metrics, recommendations.
- Vite keeps the feedback loop tight.
- D3 handles custom tree rendering, zoom, pan, and interaction.
pdfjs-distfor PDF extraction,mammothfor DOCX — so users are not trapped in a copy-paste workflow.
Backend — Spring Boot + JPA + NLP + Graphs
This is where the deterministic scaffolding lives.
- Spring Boot for the unglamorous essentials: validation, API shape, config, structured errors.
- OpenNLP for segmentation into EDU-like units.
- JGraphT for representing the rhetorical structure as a graph and computing structural properties.
- JPA + PostgreSQL for persistence in production, H2 for zero-setup dev.
LLM providers — Anthropic / OpenAI / Gemini, swappable
Provider flexibility is not a nice-to-have, it is a hedge:
- Quality, cost, and latency tradeoffs shift monthly.
- Quotas and outages happen.
- Models keep evolving.
The backend treats "LLM" as an interface. Swap providers without rewriting the app.
Caching — Caffeine (local) or Redis (distributed)
Caching does three jobs simultaneously:
- Cuts cost — do not re-run identical analyses.
- Cuts latency.
- Cuts LLM variance across repeat runs of the same input.
Caffeine covers single-node. Redis unlocks scale and Compose-friendliness.
Containers — Docker + Compose + Nginx
One command brings up a realistic stack: Postgres, Redis, backend, and a frontend served via Nginx proxying /api/* to the backend. Clean local DX, production-shaped.
The Pipeline, End to End
When a writer clicks Analyze:
- Input arrives (pasted text or extracted from an uploaded file).
- Backend segments it into EDU-like units — this regularizes what the LLM sees.
- Backend calls the selected LLM provider and asks for rhetorical structure in a machine-readable format.
- Backend parses the response and assembles a tree / graph.
- Backend computes metrics, a quality score, and recommendations.
- Result is cached and persisted — history and share links fall out for free.
- Frontend renders the tree and wires up inspection, span-jumping, and recommendation exploration.
The heart of the system is that middle band: segmentation → LLM parse → deterministic assembly and metrics. The LLM is asked to do only the thing it is good at — semantic labeling. Everything else is boring, testable, repeatable code.
Running It Locally
One-command Docker (recommended)
cp .env.example .env
# add at least one provider key: ANTHROPIC / OPENAI / GEMINI
docker compose up --build
Then:
- Frontend:
http://localhost:5173 - Backend:
http://localhost:8080
Dev mode (fast iteration)
Backend:
SPRING_PROFILES_ACTIVE=dev mvn spring-boot:run
Frontend:
cd frontend
npm install
npm run dev
What pattern does Discourse X-Ray demonstrate?
Discourse X-Ray is an argument-structure microscope aimed at the exact moment writing becomes learning — when students are figuring out how to support claims, and when writers are shaping persuasion.
Technically, it is also a clean example of a pattern that is going to matter more, not less:
Use LLMs for the hard semantic labeling. Keep the system reliable by doing segmentation, validation, caching, persistence, and analytics in your own code.
The model is the engine. The scaffolding is the car. You need both.
Related
- ScopeGuard: A Scope-Creep Radar for Client Projects — same architectural principle applied to a different domain: LLM does the unstructured reading, deterministic code does everything else.
- cron-human: why I built yet another cron library — wrapping a deterministic tool as an MCP server so LLM agents can call it.
Frequently asked questions
- What is Discourse X-Ray?
- Discourse X-Ray is an LLM-powered tool that visualises the argumentative skeleton of a written draft using Rhetorical Structure Theory (RST). It segments text into clause-like units (EDUs), labels the rhetorical relationships between them, and renders an interactive D3 tree with metrics like claim-to-evidence ratio and orphan claims.
- What is Rhetorical Structure Theory?
- Rhetorical Structure Theory (RST), introduced by Mann and Thompson in 1988, models a text as a graph of relations — evidence, cause, contrast, concession, elaboration — between clause-like units. It explains writing as a structure of supporting moves, not just paragraphs.
- Does Discourse X-Ray grade or rewrite essays?
- No. It deliberately does neither. The product brief is to show the reasoning shape of an existing draft so the writer can fix it themselves. It does not assign a grade, and it never rewrites the user's text.
- What stack does Discourse X-Ray run on?
- React + TypeScript + Vite + D3 on the frontend, Spring Boot + JPA + OpenNLP + JGraphT on the backend, pluggable LLM providers (Anthropic, OpenAI, Gemini), Caffeine or Redis for caching, and Docker Compose for one-command local setup with Postgres and Nginx.
- Can I run Discourse X-Ray locally?
- Yes. Copy .env.example to .env, add at least one provider key (ANTHROPIC, OPENAI, or GEMINI), then run docker compose up --build. The frontend is at localhost:5173 and the backend at localhost:8080.