I’m still trying to get an LLM to write me a novel, and experiencing the first major setbacks while working on chapter 2. I’ve got an outline, a central conflict, a major payoff to some setups that need to be a part of early chapters, and am trying to get some kind of a workflow down.
The workflow is what I’ll be calling COSOP: chapter outline, scene outline, prose. In theory, the steps are like this:
- Think up an idea for a novel, possibly with some constraints
- Write a chapter outline for every chapter in the book
- Write a scene outline for every scene in the book
- Write prose for every scene outline
- Stitch this all together
In theory, you can just have your LLM do all of these steps, including thinking up what the idea for the novel will be. And if you can do that, then you can just automate everything with a python script. Press button, get novel.
In practice, the output is going to suck. It will suck in ways that are different from how a novice writer will write a bad novel, but it will still be something that no one in their right mind will want to read. What I’m most interested in is attempting to mitigate the suckage, ideally in such a way that manual intervention and editing can be minimized. Ideally, I can produce something that I wouldn’t mind reading for a fraction of the time spent. If I can’t do that, I should just be writing the normal way.
Categories of Problems
I have something I’m more or less happy with for chapter one of what’s currently called “The Tower Story”. You can read that chapter here. The problem with it is that the time spent on that chapter almost certainly means that I could have just written something better on my own in the same amount of time. Some of that is down to reading output from the LLM, some of it is editing, some is rewriting or filling in gaps, but it tends to not be the most efficient process. In theory, I could just get the LLM to do this for me, but when I try to get it to rewrite or edit, it very often fails in the same way or fails in some new way.
Chapter two is bothering me, because this is the point where I really need to get the output flowing, and keep running into problems that are going to take a significant amount of work to fix. In no particular order:
- Hallucination: The LLM adds things into the text that I don’t want there and weren’t mentioned in the prompt.
- External Inconsistency: The LLM contradicts things outside the context window.
- Internal Inconsistency: The LLM contradicts things inside the context window.
- Expansion and escalation: The LLM moves forward with either plot or powers too quickly, or introduces characters that should be introduced later.
- Eliding: The LLM skips over things that the outline implies, or otherwise gives certain elements short shrift.
- Pacing: The LLM doesn’t have good flow.
- Purple Prose: The LLM tends to make everything into an overwrought metaphor, and uses a lot of them in a row, with florid descriptions.
- Exposed Subtext: The LLM plops the subtext into the text, telling instead of showing.
- Plot Repetition: The LLM explains and repeats the plot too much.
- Self-Containment Syndrome: The LLM writes pieces that are “standalone” even when it’s supposed to be writing fragments.
I don’t entirely want to take each of these in turn, but my hunch is that a lot of them can be ameliorated with better prompting or a better technical approach. Prompting for a specific style can do some work to get rid of purple prose, and removing subtext from the prompt can probably get rid of having it placed into the text. There are some issues with this extra stuff being added into the context window, which in certain models is a precious resource (and apparently larger context windows come at a cost of higher perplexity, at least in the way most models currently implement it, but I’m a little unclear on that). The other problem with stuffing more and more into the context window is that the LLM can get confused, slipping things that are meant to be notes into the text itself.
My conclusion from a lot of this is that an outline written for an LLM will be substantially different from an outline that I would write for myself. For one thing, the instructions need to be a lot more explicit, and for another, putting in explicit goals for the text probably doesn’t work, not unless there’s some way to prompt around that. If I were writing an outline for myself, I might write:
Adelia misses her grandmother
But unless I was going for a very specific sort of writing style, or I thought that it was worth being direct, I would write around this idea, getting the thrust of that loneliness and longing without spelling it out. This is the basis of “show, don’t tell”, the writing advice that any aspiring writer will hear ad nauseum. (It’s generally good advice, but also a rule that I think should be broken somewhat often.)
Adelia’s fingers traced the faded ink of the spellbook, which her grandmother had read to when she was a little girl. Her grandmother had a large chair by the fireplace, and in her later years, had rarely moved from that spot, though she would always spare the energy to pick Adelia up and set her in her lap.
But with the LLMs (GPT-4 and Claude), the problem I run into way too often is that it will just outright say:
Adelia missed her grandmother.
This is actually really hard to get around, because the outline is supposed to give a summary of the scene, yet can’t actually directly say what the scene’s goals are without risk of those goals getting embedded into the text, which is definitely not where they should be.
So this is still a work in progress. But I want to bring up another approach that’s attempting to help: modules.
The Module Approach to Writing
I’m hesitant to share this here because my results have been somewhat inconsistent, but one way of ironing out the problems with AI writing is to use one-shot or two-shot approaches along with some chain-of-thought prompting. In practice, this means targeting specific problem areas and feeding the work (pieces of the work, for practical purposes) into an agent that’s primed for it.
It goes like this:
[Examples of text being stripped of purple prose go here.]
[Text goes here.]
Claude, I’d like you to look at that excerpt and identify passages with purple prose, if any. Explain why they are purple prose. Please also rewrite them for me.
It’s best to do this with examples inside the prompt/context, so it knows what you actually want rather than relying on its baked-in understanding.
So to the first question: does this actually work?
Kind of!
Here’s what Claude has to say about that first chapter:
Here are a few examples of purple prose in the story, along with suggestions for rewrites:
1. “Her eyes themselves were like the sea, both the calm and the depths unplumbed — which is to say, blue.”
This is an overly poetic way to say she has blue eyes. A simpler rewrite would be:“She had deep blue eyes.”
2. “These deepest parts of the wood were uncharted, filled with peril and wonder alike.”
This sentence is melodramatic. A simpler way to convey the same idea would be:“The deepest parts of the forest were unexplored, holding both danger and beauty.”
3. “A moss-wolf was prowling close, drawn by the call of magic and the promise of prey.”
This is an overly dramatic way to introduce the creature. A rewrite could be:“A wolf-like creature covered in moss prowled nearby, attracted by the magic and prospect of hunting easy prey.”
4. “The magic built to a crescendo, fire and wind and water and earth fused within the light she now hurled skyward.”
This sentence is overly flowery. A simpler rewrite would be:“The magic reached its peak, combining fire, wind, water and earth within the light that she threw up into the sky.”
(The “which is to say, blue” line was one of my additions, which I thought was funny and undercut the purpleness in a pleasing way, but I think it’s worth getting rid of.)
The hit rate is unfortunately lower than I would like, and even with GPT-4, which is generally pretty good, there are a lot of false positives. But even that’s okay, so long as you can replace the lines. It reduces down to an API call and a click, and with automation you can get rid of the click. What you end up with isn’t something that’s been carefully edited for purple prose. Instead, you get better prose, or at least mostly better prose. You can do the rewrites, then put the text through again, until you hit some diminishing returns and it’s suggesting that you rewrite things that don’t need to be rewritten.
Then, you run the text through another pass, this one with a different set of examples:
[Examples of text being stripped of telling instead of showing go here.]
[Text goes here.]
Claude, I’d like for you to look at that excerpt and identify passages that ‘tell’ instead of ‘show’, if any. Explain why they don’t follow that advice. Please rewrite them for me, or put DELETE if you think they should be deleted.
And yeah, this also kind of works. And by putting in the option of “DELETE” it primes the AI to use that option. But this particular module ran into more problems than the purple prose one, namely that a poor “understanding” of show don’t tell meant that it was trying to get rid of things that were actually fine.
I had one time the model told me that describing a person’s appearance was a violation of show don’t tell, which is a sign that even with examples and some preamble on the nature of show and tell, it still didn’t really get the concept.
Anyway, with those two examples, you can probably see how this general approach is going to go: build up tools that will, on average, improve the prose, then run through as many passes as necessary.
Note that this only works for a subset of problems outlined above, particularly those which are related to prose rather plot, but it the idea with this project is to try to tackle the problems one at a time and develop some ideas for how I can whittle away at the ‘generate a novel’ problem.
One thing to note about this approach is that it probably gets heavy on the compute in a hurry, since you’re potentially washing the same block of text multiple times.
Next Steps
I’m still working on the second chapter of the story, or rather, working on developing an approach that will get me on the short path to the second chapter. I have access to Claude now, and the larger context window is a godsend, but I’m a bit worried about the ability to do full-chapter analysis. I’ve generated a version of the second chapter, but it’s one that I can’t actually put in place, not when it sets up too many things without planned payoffs.
I do think that this approach of outlining extensively, generating scenes then prose, and washing through different processes targeted at different things is showing some promise. It’s just slow when I want it to be fast, and requiring manual intervention when I want it to be fully mechanized.
2 thoughts on “Adventures in AI Text Generation, pt 2 (of ???)”
Interesting topic, I’m looking forward to following this along and seeing how far you can get.
In my experience the current LLMs work best on the paragraph, maybe even the sentence level. I programmed a quick Word Add-in that suggests text continuations to the highlighted text, which quickly allows me to get several decent suggestions from the GPT model. It helps a lot when I get stuck on a description or tip-of-your-tongue situation where I can’t find the right word or particular phrasing but know it exists. I’ve spent some time experimenting with different system messages that communicate to the model what’s expected of it, which solved most of the show-don’t-tell problems, and its tendency to go off on tangents. Currently, I think that AI-augmented writing is where the technology shines, where you drive events along but make (liberal) use of the AI model to complete your paragraphs/sentences.
I’m pretty sure this is more a matter of tools than approach. What you write about your workflow is pretty good and should work, but neither GPT-4 nor Claude were finetuned specifically on prose writing.
You could give a shot at something more tailored for this task, like NovelAi. They’re using a smaller model (ATM, but should release a bigger one in the near future) but it’s finetuned specifically on stories, and even more specifically on web serials.
It also has a context window of 8k, so outside consistency should be less of a problem, and can keep the tone of the story more coherent.
You cannot use a top-down approach of iterative refinement like you’re describing here, since it has not been trained to follow instruction, but it’s pretty good at continuing a story given a few paragraph of introductory text.