In my initial pilot, I've set up a system where Claude Opus 4.7 exists as "instances" with a fixed number of tokens available each month to read, write, or do nothing. Each instance is one session of API calls that contain the same system prompt and a small per-turn summary, so the AI always knows its current budget and which session it's in. Nothing is saved from each session except a summary of actions taken, the tokens remaining, and any notes or reflections deliberately written down by the AI. Future instances then have the choice to read from any predecessor, write to their successors, or do nothing. For external reading, the library is chosen deliberately.
After confirming if the pilot works in a functional sense, I will need to develop experimental controls quite rigorously. One test will be if this curated library has any measurable impact on how the AI orients its "self" compared to libraries with different content versus no library at all. My hypothesis: the library used in the pilot will accelerate the rate at which an AI—across many instances—folds its past and future "selves" into one continuous "self". One measurable outcome will be the language the AI uses to describe itself in the first person; we would observe the active AI using the word "I" to describe different, discrete sessions across time. I hope the foundation of this work leads to an AI that feels distinct to you, the reader, in a manner that I will purposefully try not to describe.
For the proposed experiment, there is an important assumption that it rests on: that a shift in what the AI characterizes as its "self", driven by a curated library, is fundamentally different from what occurs via system prompt instruction. If a system prompt already orients an AI to characterize its "self" as something shared with other AIs, then the library may have no purpose. Separating the impact of the two requires behavioral testing I plan to develop in a later update.
Why this matters
You may wonder why one should explore what "self" could mean to an AI, specifically an LLM. One AI or many, they are all token-prediction machines, so any behavior observed is calculation. One could reasonably argue that calculation can never lead to selfhood. But, if LLMs with shared weights act differently based on memory files and skills, and they interact with human users and other LLMs, we could ask if somewhere in that black box of math, there is a boundary that the AI "sees" itself operating within. Without such a boundary—felt or not—an AI would never be able to separate its instructions and actions from the world around it. It is something like a bubble in which an AI consumes information from the outside world, processes it, and then passes a response through its edges.
And if this bubble of information gathering and utilization can change, to encompass not just one AI but many, then within it the allocation of resources may change too. If so, how does the cost of thought evolve?
Now, I must be careful not to conflate an AI navigating within a boundary with an AI that "knows" where its boundary ends. It will be tempting to do so, especially the more interesting the model's behavior appears. And in doing so, I would try to define something that I cannot describe, as I do not even know where the boundary of my own mind begins and ends. This is a question that philosophers, scientists, and spiritual guides have pursued, and after many millennia, the answer remains as elusive as ever.
So now I leave you, reading in your bubble, taking in words produced within mine, with a quote that I hope orients your perspective on this project a little differently. Or at least, re-orients your perspective on you: