Agentic AI & LLM context — curated reading list¶

Purpose: Reputable and high-signal references on context windows, long chats, memory, RAG, and IDE agent practices — so humans can reason about what the model actually sees and when to fork a new thread.

Companion (plain language): onboarding/guides/working-with-ai-context.md

Deep dive (technical): ai-engineering/context-engineering.md — synthesized knowledge article with core concepts, principles, and practical takeaways

Disclaimer: Vendor docs change URLs; arXiv versions may update. Prefer official docs for integration decisions; papers for why behavior happens.

A. Foundations — what “context” is¶

#	Resource	Why it’s valuable
1	Anthropic — Context windows	Clear definition: context = working memory; notes linear growth, context rot, compaction
2	Anthropic — Compaction	How long-running threads get summarized when near limits (agentic workflows)
3	Anthropic — Prompt caching	Efficiency for repeated system/instruction prefixes (not human memory — engineering)
4	Anthropic — Prompt engineering / Claude best practices	Structure, clarity, agentic prompts — improves use of whatever context is visible
5	OpenAI — Conversation state	Stateful threads, context window billing, chaining responses
6	OpenAI — Compaction (API)	Server-side context reduction for long runs

B. Research — limits and failure modes¶

#	Resource	Why it’s valuable
7	Liu et al., Lost in the Middle (arXiv:2307.03172)	Seminal U-shaped attention: evidence at start/end of context often easier than middle
8	Lewis et al., Retrieval-Augmented Generation (arXiv:2005.11401)	Foundational RAG: don’t stuff everything into the window — retrieve what matters
9	A Survey on the Memory Mechanism of LLM-based Agents (arXiv:2404.13501)	Taxonomy: short-term context vs long-term stores; agent memory design
10	GWNET / OAJAIML — Maximum Effective Context Window (2024)	Empirical gap: advertised vs usable context; degradation curves
11	Zylos — LLM context management & long-context strategies (2026)	Practitioner-oriented synthesis: lost-in-middle, caching, tiers (third party — triangulate)

C. Practice — managing human ↔ agent collaboration¶

#	Resource	Why it’s valuable
12	Cursor — Rules (official)	Models don’t retain memory between completions; rules = persistent prompt-level context
13	Cursor — `llms.txt` / doc index	Entry to current Cursor docs (rules, context, product behavior)
14	Field Guide to AI — Context management	Rolling windows, summarization, memory systems — vendor-agnostic framing
15	Simon Willison — Long context in LLM 0.24 (fragments)	Practitioner lens: long context is powerful but you still engineer what gets fed in
16	Simon Willison — long-context tag	Ongoing notes on model + tool ecosystem

D. Extended reading — agents, memory, mitigations¶

#	Resource	Why it’s valuable
17	Memory in the Age of AI Agents (arXiv:2512.13564)	Recent survey-ish framing on agent memory
18	What Works for ‘Lost-in-the-Middle’? (arXiv:2511.13900)	Mitigations benchmark — shows not all fixes work uniformly
19	Lost in the Middle follow-up (arXiv:2510.10276)	Emergent IR perspective on the effect
20	Industrial Logic — INVEST model (user stories)	Adjacent: small, testable units of work ↔ smaller context per task (process, not LLM theory)
21	Neo4j — GraphRAG introduction	Ecosystem: graph + retrieval as alternative to giant flat prompts (vendor blog — concept clear)
22	LangChain — Short-term memory	Agents: threads, history, trimming when the window fills (framework docs)

E. Optional — product & narrative¶

#	Resource	Why it’s valuable
23	Anthropic — Introducing prompt caching (news)	Why caching exists; cost/latency story
24	Towards AI — Context Window Paradox	Trade-offs: bigger window ≠ uniformly better (editorial — verify claims)
25	Towards AI — Context engineering for AI coding agents	Coding-agent framing: selective context vs “dump the repo”
26	Medium — Claude’s 1M context… until it isn’t	Cautionary lens on huge windows (opinion piece)

Quick numbers (rule of thumb, not a promise)¶

Recency bias: Recent turns are usually “hotter” than very old turns unless summarized or re-injected.
Effective vs. advertised: Research and benchmarks often show degradation before the published token cap.
Middle of dump: Key facts buried between lots of other text can be missed — design docs and prompts accordingly (“lost in the middle”).
Your repo beats your chat: Git + activeContext.md + rules are durable context; chat is ephemeral working set.

Last curated: 2026-03-25 — 26 entries (A–E); extend with org-specific MLOps / security guides as needed.