Agentic AI & LLM context — curated reading list¶
Purpose: Reputable and high-signal references on context windows, long chats, memory, RAG, and IDE agent practices — so humans can reason about what the model actually sees and when to fork a new thread.
Companion (plain language): onboarding/guides/working-with-ai-context.md
Deep dive (technical): ai-engineering/context-engineering.md — synthesized knowledge article with core concepts, principles, and practical takeaways
Disclaimer: Vendor docs change URLs; arXiv versions may update. Prefer official docs for integration decisions; papers for why behavior happens.
A. Foundations — what “context” is¶
| # | Resource | Why it’s valuable |
|---|---|---|
| 1 | Anthropic — Context windows | Clear definition: context = working memory; notes linear growth, context rot, compaction |
| 2 | Anthropic — Compaction | How long-running threads get summarized when near limits (agentic workflows) |
| 3 | Anthropic — Prompt caching | Efficiency for repeated system/instruction prefixes (not human memory — engineering) |
| 4 | Anthropic — Prompt engineering / Claude best practices | Structure, clarity, agentic prompts — improves use of whatever context is visible |
| 5 | OpenAI — Conversation state | Stateful threads, context window billing, chaining responses |
| 6 | OpenAI — Compaction (API) | Server-side context reduction for long runs |
B. Research — limits and failure modes¶
| # | Resource | Why it’s valuable |
|---|---|---|
| 7 | Liu et al., Lost in the Middle (arXiv:2307.03172) | Seminal U-shaped attention: evidence at start/end of context often easier than middle |
| 8 | Lewis et al., Retrieval-Augmented Generation (arXiv:2005.11401) | Foundational RAG: don’t stuff everything into the window — retrieve what matters |
| 9 | A Survey on the Memory Mechanism of LLM-based Agents (arXiv:2404.13501) | Taxonomy: short-term context vs long-term stores; agent memory design |
| 10 | GWNET / OAJAIML — Maximum Effective Context Window (2024) | Empirical gap: advertised vs usable context; degradation curves |
| 11 | Zylos — LLM context management & long-context strategies (2026) | Practitioner-oriented synthesis: lost-in-middle, caching, tiers (third party — triangulate) |
C. Practice — managing human ↔ agent collaboration¶
| # | Resource | Why it’s valuable |
|---|---|---|
| 12 | Cursor — Rules (official) | Models don’t retain memory between completions; rules = persistent prompt-level context |
| 13 | Cursor — llms.txt / doc index |
Entry to current Cursor docs (rules, context, product behavior) |
| 14 | Field Guide to AI — Context management | Rolling windows, summarization, memory systems — vendor-agnostic framing |
| 15 | Simon Willison — Long context in LLM 0.24 (fragments) | Practitioner lens: long context is powerful but you still engineer what gets fed in |
| 16 | Simon Willison — long-context tag | Ongoing notes on model + tool ecosystem |
D. Extended reading — agents, memory, mitigations¶
| # | Resource | Why it’s valuable |
|---|---|---|
| 17 | Memory in the Age of AI Agents (arXiv:2512.13564) | Recent survey-ish framing on agent memory |
| 18 | What Works for ‘Lost-in-the-Middle’? (arXiv:2511.13900) | Mitigations benchmark — shows not all fixes work uniformly |
| 19 | Lost in the Middle follow-up (arXiv:2510.10276) | Emergent IR perspective on the effect |
| 20 | Industrial Logic — INVEST model (user stories) | Adjacent: small, testable units of work ↔ smaller context per task (process, not LLM theory) |
| 21 | Neo4j — GraphRAG introduction | Ecosystem: graph + retrieval as alternative to giant flat prompts (vendor blog — concept clear) |
| 22 | LangChain — Short-term memory | Agents: threads, history, trimming when the window fills (framework docs) |
E. Optional — product & narrative¶
| # | Resource | Why it’s valuable |
|---|---|---|
| 23 | Anthropic — Introducing prompt caching (news) | Why caching exists; cost/latency story |
| 24 | Towards AI — Context Window Paradox | Trade-offs: bigger window ≠ uniformly better (editorial — verify claims) |
| 25 | Towards AI — Context engineering for AI coding agents | Coding-agent framing: selective context vs “dump the repo” |
| 26 | Medium — Claude’s 1M context… until it isn’t | Cautionary lens on huge windows (opinion piece) |
Quick numbers (rule of thumb, not a promise)¶
- Recency bias: Recent turns are usually “hotter” than very old turns unless summarized or re-injected.
- Effective vs. advertised: Research and benchmarks often show degradation before the published token cap.
- Middle of dump: Key facts buried between lots of other text can be missed — design docs and prompts accordingly (“lost in the middle”).
- Your repo beats your chat: Git +
activeContext.md+ rules are durable context; chat is ephemeral working set.
Last curated: 2026-03-25 — 26 entries (A–E); extend with org-specific MLOps / security guides as needed.