Collective A.I.: Recap Part 3: Generation from Collective Gameplay

In 2008, I implemented my first version of a system that plans character behavior and dialogue from recorded gameplay data. This system combined the components described in Recap 1 and Recap 2 with a replay system, which automated a character by playing back fragments of log files (or one log in its entirety).

The planner is a bit like the Game AI equivalent of the Random Paper Generator. It essentially randomly wanders between 5,000 log files, stitching together fragments of behavior and dialogue at run-time, while maintaining local coherence by critiquing log transitions for statistical likelihood. This random stitching process is complicated by the fact that each character (customer and waitress) needs to observe the behavior of the other character, and select a sensible response from a log file.

More after the jump...

At a high level, here is what happens: each character selects an arbitrary log file, begins a replay, and continually critiques him/herself for the likelihood of executing the next action in the log (based on the n-gram model described in Recap 1). When the log indicates that other character does something, the character waits to observe the expected action. If the next observation does not match the expectation, the character looks for a new log file that is a better match for recent observations. While replaying physical actions, the next action might be a dialogue line, at which point the character toggles to the chat system (described in Recap 2). The dialogue interaction may eventually terminate with a physical action, leading the characters to toggle back to the physical replay system, and so on.

I presented a paper at AAMAS 2009 in Budapest covering the details of this system. I had the good fortune of having this paper summarized by Michael Mateas at the GDC 2010 Game Studies Download 5.0 panel. Michael describes the work more clearly than I can describe it myself – see for yourself.

Incidentally, while wandering around Budapest, I found Cate Archer painted on a wall outside a bar. Who knew the Hungarians were such big fans of NOLF?

Planner Demos

Below are some videos demonstrating the planner in action.

The videos are more interesting if you keep a couple things in mind:

All of the characters’ behavior and dialogue comes directly out of recorded human-human games (typos and all).
The first two videos show two AI characters interacting. However, there is no centralized control. Each character has a separate pool of log files to draw from, and they are responding to one another dynamically in real-time based on observed actions, state changes, and raw chat text. Their AI processes are running on an AI server, and each character could be running on an entirely different machine.

This is a video of one of the better runs:

AAMAS 2009 Clip 1 on Vimeo.

Here is an average run:

AAMAS 2009 Clip 2 on Vimeo.

Then, there were also many runs where the characters go to the bar get stuck in an infinite loop of ordering beer. You don’t need to see those.

The characters can interact with other characters, or with humans. Here is a video of an AI waitress interacting with a human customer. It works fairly well as long the human behaves very cooperatively. You can see that the system does not yet have any implementation of long-term memory. Towards the end of the video, there are some examples of the AI responding to… less ordinary interactions:

AAMAS 2009 Clip 3 on Vimeo.

The Good, the Bad, and the Ugly

As a game developer, having spent a decade programming behavior by hand, it was very exciting to see that these characters could do anything remotely human-like without any hand-programming at all. But there are obviously some serious drawbacks to this system – primarily the complete lack of designer control, and the related fact that characters do not always do the right thing. Below are some cool things about this system, and some not-so-cool things.

Cool Things:

High-level behavior (dialogue and decomposable actions) runs on an AI server networked with game (via sockets). Low-level behavior (pathfinding, animations, locomotion) is implemented in a layer integrated with the game engine. For example, the AI server sends a command to the waitress to pickup a steak from the kitchen, and the low-level game-side layer navigates to the kitchen, resolves the reference to a specific steak in the game world, and selects the animation to pick it up.
Characters can be running on different servers. They only communicate within the game world by observing one another’s actions, state changes, and raw chat text. Characters observe and respond to humans through the same machinery used to respond to other AI characters
No hand-authoring of high-level behaviors.
Interactions play out differently every time.
The system handles both physical interaction and natural language dialogue.

Not-So-Cool Things:

No designer input. No authorial control.
Sometimes characters do wrong, weird, or out-of-context things.
Behavior is guided by statistical regularities, but sometimes the most interesting behaviors are statistical outliers. These outliers get filtered out, for better or for worse.
Constraining behavior with a trigram model ensures local coherence, but not global coherence -- characters tend to do sensible things from moment-to-moment, but over longer periods of time they can get caught in cycles.

Smart Cookie

So, implementing this system was an informative foray into statistical modeling of language and behavior, but soon after publishing this work, I received some poignant words of wisdom from a fortune cookie: "Don’t let statistics do a number on you." Statistics are useful for mining recurring patterns in large datasets of gameplay data, but statistics alone cannot replace game designers -- ultimately there needs to be a human in the loop to identity behaviors of interest and to make sense of the higher-level structure of the scenario. Noah Wardrip-Fruin’s Expressive Processing has a chapter about statistical AI that highlights The Restaurant Game, and comes to this same conclusion. My current research aims to leverage statistics while keeping a human designer in the loop, forming a powerful human-machine collaborative authoring process.

That wraps up the recaps. The rest of 2009, after AAMAS, was kind of the dark period of my PhD, in terms of productivity. I spent a lot of time working on a dialogue act classifier (described in this paper) that I ultimately have not ended up using in my new system. And then over six months was devoted to intense reading for my General Exams. Future posts will begin describing components of my new system, which is still in development, and aims to address shortcomings of the first version of the planner, with the concession that is no longer a fully automated authoring system (but I think that is for the best, really).

Friday, February 25, 2011

Recap Part 3: Generation from Collective Gameplay

No comments: