Tuesday, December 20, 2011

Thesis Proposal Recap

I'm long overdue for a new blog post.  2011 has been a busy year.  I've been making a lot of research progress, which has kept me too busy to blog!  I've hit a number of milestones in recent months -- my thesis proposal was approved in August, a prototype of my new planner came together in October, and I launched a crowdsourced data annotation effort in November.  This post will briefly recap the proposal that I presented to my committee. 

I'm building a new planner that combines plan recognition with case-based reasoning, to simulate reasoning from a collection of episodic memories.  My approach is production-oriented, in that it is mindful of the content authoring bottleneck (where content is AI behaviors and dialogue lines), and attempts to make it as easy as possible to get an enormous variety of content into the system.  The approach leverages the realities of the world we live in today, where it is possible to crowdsource repetitive tasks to non-experts, and opts for powering the system with lots of mundane data annotation rather than cleverly programmed AI.

For background information on this project, see the previous recaps: Part 1, Part 2, Part 3.

More after the jump...


Follow research updates on Twitter: @jorkin

INTRODUCTION

AI for combat has come a long way in the past 15 years, or so.  How we can make other parts of the experience -- the social interaction, and storytelling -- as dynamic as the combat?  While numerous games deliver sandbox-style combat that adapts to player choices, social interaction and storytelling is generally either entirely linear, or scripted with limited opportunities for interaction or influence.  Game developers have mastered pathfinding, animation, scripting, and reactive behavior, but have made less progress producing characters that can dynamically communicate, cooperate, and maintain coherent interactions over long periods of time.

What about The Sims and Facade?  These are exceptions that inspire my work, but lie at opposite ends of a spectrum, where I am targeting something in between.  The Sims is entirely emergent; a doll house that does not attempt to tell any particular narrative.  Facade tells a specific, coherent narrative, but has been criticized for limiting player agency. 


Graphics Envy

It is often noted that AI has not kept pace with graphics in games.  There was a time when every pixel on the screen was plotted by hand.  Today we have arrived a representation that allows us to render 3D worlds at runtime from any camera position.  And this representation scales -- with more processing power, we can render more polygons (millions of polygons!), yielding incredible detail.
 
Graph generated by recording 5,000 pairs of players.
We are essentially still in the pixel-plotting days of AI, where we are crafting every decision by hand -- an approach that does not scale beyond the complexity of behaviors that we see in current games.  This graph (to the left) was generated by recording 5,000 pairs of humans playing as customers and waitresses in a virtual restaurant.  The graph shows all of the action sequences observed from the start of the game (at the top), until the end of the game (at the bottom).  Human behavior is complex, and nuanced.  We will never be able to author such a dense possibility space by hand.  We need a representation that can be recombined to generate all of these possibilities at runtime, adapting to player choices.

COLLECTIVE ARTIFICIAL INTELLIGENCE

I refer to my approach as Collective Artificial Intelligence -- a combination of crowdsourcing, pattern-mining, and episodic planning.

Crowdsourcing

There is no easier way to author behavior than through a live demonstration (i.e. playing a character in a game).  Anyone can do it, without any technical know-how.  I have collected three data sets of behavior:  The Restaurant Game has recorded over 10,000 demonstrations, Improviso is currently recording actors on a virtual film set, and Mars Escape was a collaboration with the Personal Robots Group at the Media Lab, which recorded hundreds of demonstrations of a human-robot collaboration. All of these games were created from the same codebase.

The Restaurant Game, Improviso, and Mars Escape.
Recording thousands of people in the same scenario can be thought of as crowdsourcing our imagination, which is the first step in my process.  In the next step -- pattern-mining -- crowd-sourcing is employed once again, to help interpret the data collected from the first step.


Pattern-Mining

Pattern mining is a human-machine collaborative process.  Humans are recruited to annotate data with browser-based tools, explaining the meaning of different action sequences to the AI system.  Including humans in the loop enables capturing sparse examples of behavior and dialogue that would have slipped through the cracks of statistical machine learning algorithms, and provides an opportunity for a designer to control which behaviors to capture and which to ignore.  One of the goals of this work is to dymystify Game AI from being a black art, by refactoring a difficult problem into many simple annotation tasks.  The intuition is that few people can program behavior, but anyone who speaks English can explain behavior given a transcript of a recorded game.

Each recorded transcript is annotated with four layers of meta-data: event sequences, event hierarchies, causal chains, and references. Embedding streamlined annotation tools in a browser makes it possible to take advantage of the numerous web sites that exist for hiring people online to perform unskilled labor (e.g. Amazon Mechanical Turk, CrowdFlower, eLance, oDesk).  Below is a video demonstrating the process of annotating event sequences.  More information about annotation is available in papers from INT3 and AIIDE.



Watch on Vimeo: Event Annotation.


Episodic Planning

Annotated data is the fuel that powers the new episodic planning system.  The diagram below illustrates the machinery inside the mind of a character.  In brief:  the agent receives observations through sensors, and records the observed interaction history on the Blackboard.  In order to understand and respond to observations, the agent can exploit Collective Memory -- a database of recorded transcripts, and associated meta-data.  The Plan Recognizer leverages the learned Event Dictionary to infer an event hierarchy from the observed interaction history.  The Action Selector then searches for recorded human games that match the inferred event hierarchy, and passes these as proposals to a set of Critic processes.  Critics draw on a variety of sources to scrutinize the validity and coherence of following the next step in each proposed plan.  If one of the proposals is approved by all critics, it is passed to the Actuator for execution.

Agent architecture for episodic planning.

Below is a video demonstrating the system.  The bottom of the video illustrates the plan recognition process that is running in the Customer's mind as new observations arrive.  While it would be easy to script something like this, keep in mind that the video is unscripted -- the two characters are dynamically responding to one another based on observed actions and natural language dialogue text.  Each time the system runs, the scenario plays out differently.  I am working toward a demo where a human can play one of the characters, but this requires much more annotated data to cover the space of possible behavior.  The big difference between this system and my earlier statistical approach is that the new system always produces a coherent narrative, or if it doesn't, it's a bug that can be fixed rather than just some statistical anomaly.


Watch on Vimeo: Episodic Planner: first prototype.


EVALUATION

My committee approved my proposal, giving me an "uncondition pass"... with a condition.  The condition is that I write a concise, focused, one-page plan for how I will evaluate the system.  I am still thinking about this.  I hope to quantitatively show that the system produces human-like interactions, when compared to thousands of human-human transcripts.  But more importantly, I want to show qualitatively that this system produces a new experience; one that players find more engaging due to an increased sense of agency.  I want to demonstrate that players feel that the AI character is cooperating, and helping the human player take the narrative in the direction the human chooses to go.

Now that my proposal has been approved, I am required to defend my thesis within 12 months of approval, so allegedly I will be finished by sometime in August 2012 at the latest! 

2 comments:

  1. That looks promising. Good luck!!

    ReplyDelete
  2. God, I really want to run your demo! I wanna see what all the characters do! Color me extremely interested! :)

    ReplyDelete