Skip to content

Agentic AI & LLM context — curated reading list

Purpose: Reputable and high-signal references on context windows, long chats, memory, RAG, and IDE agent practices — so humans can reason about what the model actually sees and when to fork a new thread.

Companion (plain language): onboarding/guides/working-with-ai-context.md

Deep dive (technical): ai-engineering/context-engineering.md — synthesized knowledge article with core concepts, principles, and practical takeaways

Disclaimer: Vendor docs change URLs; arXiv versions may update. Prefer official docs for integration decisions; papers for why behavior happens.


A. Foundations — what “context” is

# Resource Why it’s valuable
1 Anthropic — Context windows Clear definition: context = working memory; notes linear growth, context rot, compaction
2 Anthropic — Compaction How long-running threads get summarized when near limits (agentic workflows)
3 Anthropic — Prompt caching Efficiency for repeated system/instruction prefixes (not human memory — engineering)
4 Anthropic — Prompt engineering / Claude best practices Structure, clarity, agentic prompts — improves use of whatever context is visible
5 OpenAI — Conversation state Stateful threads, context window billing, chaining responses
6 OpenAI — Compaction (API) Server-side context reduction for long runs

B. Research — limits and failure modes

# Resource Why it’s valuable
7 Liu et al., Lost in the Middle (arXiv:2307.03172) Seminal U-shaped attention: evidence at start/end of context often easier than middle
8 Lewis et al., Retrieval-Augmented Generation (arXiv:2005.11401) Foundational RAG: don’t stuff everything into the window — retrieve what matters
9 A Survey on the Memory Mechanism of LLM-based Agents (arXiv:2404.13501) Taxonomy: short-term context vs long-term stores; agent memory design
10 GWNET / OAJAIML — Maximum Effective Context Window (2024) Empirical gap: advertised vs usable context; degradation curves
11 Zylos — LLM context management & long-context strategies (2026) Practitioner-oriented synthesis: lost-in-middle, caching, tiers (third party — triangulate)

C. Practice — managing human ↔ agent collaboration

# Resource Why it’s valuable
12 Cursor — Rules (official) Models don’t retain memory between completions; rules = persistent prompt-level context
13 Cursor — llms.txt / doc index Entry to current Cursor docs (rules, context, product behavior)
14 Field Guide to AI — Context management Rolling windows, summarization, memory systems — vendor-agnostic framing
15 Simon Willison — Long context in LLM 0.24 (fragments) Practitioner lens: long context is powerful but you still engineer what gets fed in
16 Simon Willison — long-context tag Ongoing notes on model + tool ecosystem

D. Extended reading — agents, memory, mitigations

# Resource Why it’s valuable
17 Memory in the Age of AI Agents (arXiv:2512.13564) Recent survey-ish framing on agent memory
18 What Works for ‘Lost-in-the-Middle’? (arXiv:2511.13900) Mitigations benchmark — shows not all fixes work uniformly
19 Lost in the Middle follow-up (arXiv:2510.10276) Emergent IR perspective on the effect
20 Industrial Logic — INVEST model (user stories) Adjacent: small, testable units of work ↔ smaller context per task (process, not LLM theory)
21 Neo4j — GraphRAG introduction Ecosystem: graph + retrieval as alternative to giant flat prompts (vendor blog — concept clear)
22 LangChain — Short-term memory Agents: threads, history, trimming when the window fills (framework docs)

E. Optional — product & narrative

# Resource Why it’s valuable
23 Anthropic — Introducing prompt caching (news) Why caching exists; cost/latency story
24 Towards AI — Context Window Paradox Trade-offs: bigger window ≠ uniformly better (editorial — verify claims)
25 Towards AI — Context engineering for AI coding agents Coding-agent framing: selective context vs “dump the repo”
26 Medium — Claude’s 1M context… until it isn’t Cautionary lens on huge windows (opinion piece)

Quick numbers (rule of thumb, not a promise)

  • Recency bias: Recent turns are usually “hotter” than very old turns unless summarized or re-injected.
  • Effective vs. advertised: Research and benchmarks often show degradation before the published token cap.
  • Middle of dump: Key facts buried between lots of other text can be missed — design docs and prompts accordingly (“lost in the middle”).
  • Your repo beats your chat: Git + activeContext.md + rules are durable context; chat is ephemeral working set.

Last curated: 2026-03-25 — 26 entries (A–E); extend with org-specific MLOps / security guides as needed.