Channel
Interviewed Person
Lars Grammel
[Music] hey everyone I'm presenting Storyteller na for generating short audio stories for preschool kids Storyteller is implemented using TypeScript and model Fusion in AI orchestration library that I've been developing it generates audio stories that are about two minutes long and all
it needs is a voice input here's an example of the kind of story it generates to give you an idea one day while they were playing Benny noticed something strange the forest wasn't as vibrant as before the leaves were turning brown and the Animals seemed less cheerful worried Benny asked his friends what was wrong friends why do the trees look so sad and why are you all so quiet today Benny the forest is in trouble the trees are dying and we don't know what to do how how
does this work let's dive into the details of the Storyteller application Storyteller is a client server application the client is written using react and the server is a custom fastify implementation the main challenges were responsiveness meaning getting results to the user as quickly as possible uh quality and consistency so when you start Storyteller it's just a small screen that has a record topic button and once
you start pressing it it starts recording um the audio when you release gets sent to the server as a buffer and there we transcribe it for transcription I'm using OpenAI whisper um it is really quick for a short topic 1.5 seconds and once it becomes available an event goes back to the client so the client server communication Works through an event stream server andent events that are being sent back
the event arrives on the client and the react State updates updating the screen okay so then the user knows something is going on in parallel I start generating the story outline for this I use gpt3 turbo instruct which I found to be very fast so it can generate a story outline in about 4 seconds and once we have that we can start a bunch of other tasks in parallel generating the title generating
the image and generating and narrating the audio story all happen in parallel I'll go through those one by one now first the title is generated for this OpenAI gpt3 turbo instruct is used again giving a really quick result once the title is available it's being sent to the client again as an event and rendered there in parallel the image generation runs first uh there needs to be a prompt
to actually generate the image and here consistency is important so we pass in the whole story into a gp4 prompt that then extracts relevant representative keywords for an image prompt from the story that image prompt is passed into stability AI stable diffusion Excel where an image is generated the generated image is stored as a virtual file in the server and then
an event is sent to the client with a path to that file the client can then through a regular URL request just retrieve the image as part of an image tag and it shows up in the UI generating the full audio story is the most timeconsuming piece of the puzzle here we have a complex promt that takes in the story and creates a structure with