Collective A.I.

The Adventure Continues

2013-08-13T10:31:00.001-04:00

I am continuing to work on Collective A.I. through a company I co-founded called Giant Otter Technologies, inc. We will be blogging from http://giantotter.com/blog/, as well as posting updates on Twitter: @GiantOtterTech. See you there.

Defense Abridged

2013-07-03T23:52:00.000-04:00

I've been meaning to blog a reader's digest version of my PhD thesis defense since December. Now that six months have passed, it's about time to follow through with that plan. You can also watch my AI Summit talk from the GDC Vault (starts at 20:45), which is a condensed version of my defense, minus the study results. The complete thesis document is available here: Collective Artificial Intelligence: Simulated Role-Playing from Crowdsourced Data

PREFACE

I had a great committee -- my PhD advisor Deb Roy (@dkroy), along with Nick Montfort (@nickmofo), and Mark Riedl (@mark_riedl). In an attempt to give my thesis some industry relevance, I invited Gabe Newell to join the committee as well. To my surprise, he agreed and participated in the proposal phase. Gabe's proposal feedback greatly motivated my direction moving forward from the proposal to the defense. In part he said, "You will fail at creating a greater sense of player contingency." My grand plan backfired -- one of my game industry heroes was predicting failure at exactly what I was intending to achieve! This skepticism actually served me well, and pushed me to think through some big issues.

Prior to my proposal, I had focused on using data recorded from thousands of players to automate AI characters who could dynamically converse and interact with other AI characters. But my ultimate goal for this research was to support AI dynamically interacting and conversing with human players, and to show how data-driven interaction can support a vastly more open-ended, player driven experience. So, catalyzed by Gabe's feedback, after the proposal I shifted 100% of my energy toward demonstrating how data from The Restaurant Game could support unscripted face-to-face social interaction and dialogue between a human player and an NPC.

HERE WE GO....

My thesis looks at games as a storytelling medium. Every medium allows telling stories in different ways. As Rockstar's Dan Houser said to the New York Times, "Books tell you something, movies show you something, games let you do something." The videogame industry has made a lot of progress in allowing the player to do things physically -- players can run anywhere, drive anywhere, and shoot at whatever they want, but has made much less progress supporting open-ended social interaction and dialogue. In general, we're still stuck with the same pre-scripted, multiple choice dialogue trees we've been seeing for 30 years, limiting the players' ability to express themselves and guide the storytelling experience.

Multiple-choice social interaction in Mass Effect 3

There are two obstacles preventing us from creating more open-ended experiences. The first is the content authoring bottleneck -- creating character behaviors is a technical, labor-intensive process, and authoring tools are relatively primitive. The second obstacle may be the bigger issue: human imagination is a limited resource. No matter how talented your designers and programmers are, each individual can only anticipate so many possibilities. In order to support more open-ended interaction, we need to rethink the way we author character behavior and dialogue. In particular, we need to move toward more data-driven approaches in order to scale up the interaction. To explore this, I launched The Restaurant Game in 2007, which anonymously paired 16,000 people online to play the roles of customers and waitresses. Players could say anything they wanted to each other (via typed text), and interact with the 3D environment via a point-and-click interface. We recorded everything, and could extract a discrete action sequence from each gameplay session. The question is, how can we exploit thousands of these action sequences to support open-ended interaction? And what will that experience be like for the player? Answering these questions could not only impact games, but also have implications for online education and training, and social robotics.

Open-ended natural language input in Facade.

There are a few examples of games that have ventured beyond multiple choice dialogue -- notably Facade, which very much inspired my own work. Facade was released in 2005, as I was wrapping up work on FEAR. While I was focusing on simulating action-packed combat, Facade was delivering drama by simulating social interaction in the mundane setting of a yuppy couple's apartment, and I was blown away. As Grace and Tripp start bickering, the player can type anything they want, to try to defuse the situation, or stoke the fire. Facade can't understand everything the player types, but elegantly designs around the hard language understanding problems -- when input is not understood, Grace and Tripp just continue to argue, which brilliantly succeeds in making the player feel like an awkward third wheel. But this is not a general solution. How can we support interactions between two characters, rather than three, where the player is face-to-face with an NPC, and everything s/he says matters, and cannot be ignored? How can the player use language effectively to navigate the story space?

Tension between freedom and system comprehension, represented as a 2D space.

We can think of the interaction problem as a two-dimensional space, where the Y-axis indicates how much freedom the player has to say and do things, and the X-axis indicates how well the machine can understand what the player is saying or doing, and respond appropriately. Commercial games, like Mass Effect 3, sit close to the X-axis -- the machine can understand almost everything they do, because their freedom is so constrained. Facade is somewhere in between, giving much more freedom, at the cost of system comprehension. The holy grail is the top-tight corner, where the player has complete freedom, and the machine understands everything. My research is aiming for something closer to that holy grail.

SO, WHAT DID I BUILD?

To get closer to that holy grail in a practical way, I've been experimenting with a hybrid interface, where the user can say anything they want (typed, or speech-to-text), and when an exact match for the input does not exist, the system dynamically generates dialogue options intended to be semantically similar and contextually relevant.

Hybrid user-interface in The Restaurant Game.

The underlying system that drives the behavior and dialogue of the NPC, in response to human interaction, relies on an approach I refer to as Collective Artificial Intelligence, which consists of three steps:

Record thousands of people playing roles in some scenario.
Mine gameplay data for patterns of language and behavior.
Replay fragments of recorded behavior at appropriate times at runtime.

Below is a graph generated by plotting all action sequences observed in 5,000 gameplay sessions of The Restuarant Game. Each node represents a unique action, and all games progressed from node "Start" at the top to node "End" at the bottom. This image illustrates that human behavior is complex, and nuanced, and far beyond what we can encode by hand. I spent a couple years looking at various ways to automatically mine patterns in this data (n-grams, SVMs, HMMs, affinity propagation, PLWAP), and made some encouraging progress, but ultimately concluded that these approaches have the effect of filtering out the nuance of the interaction due to sparse data. But the motivation of recording thousands of people in the first place was to capture the nuance! So, in 2010 I changed direction, toward a human-machine collaborative approach, where humans are employed to interpret the meaning of patterns in the data.

Graph of action sequences observed in 5,000 two-player games.

I created browser-based tools (Flex, ActionScript 3) and used oDesk to hire people from the Philippines, Pakistan, India, and the U.S. to annotate data, applying a narrative structure that represents a hierarchy of events, long-range dependencies indicating causal chains and references, and expressions of attitude. This structure also represents modulation of affinity and tension, but these aspects have not yet been implemented.


Narrative structure applied to gameplay transcripts.

It took a team of seven outsourced annotators a total of 415 person-hours to tag 1,000 transcripts with four types of meta-data, which works out to about 1.5 weeks, if they were working 40 hours/week, and it cost about $3,000. As a final step, lines of dialogue are manually semantically clustered. I did this step myself, and it took about two weeks to cluster 18,000 lines.

The annotated data serves as Collective Memory, driving the decisions of the runtime planning architecture (written in Java), which combines plan recognition with case-based planning. At a high level, the agent recognizes discrete sequences of observations representing events, infers a hierarchy of events, and retrieves gameplay transcripts (aka cases) containing event hierarchies that are similar at an abstract level. Retrieved cases are critiqued, leveraging meta-data to scrutinize a proposed next-action for coherence. For a simple example, if someone ordered steak, and the AI waitress is considering a next action for serving pie, a critic will reject this proposal due to violating a long-range dependency tagged by a human, where ordering steak causes the waitress to serve a steak, rather than pie. All of the critics are domain-independent, with the exception of the Domain Critic, which accesses rules encoded in the Domain Knowledge Manager.

Runtime planning architecture for an agent.

DEMOS!

So, enough jibber jabber. Let's see what this system actually does. Below are three videos of a human customer interacting with an AI waitress. The first video highlights how the system auto-completes the same input in different ways depending on context, and how the waitress can exploit player data to respond to some of the more unusual things the player does.

I think of these data-driven characters as improvisational actors, who can take direction at a high-level. This video demonstrates directing the waitress to be rude, which has the effect of biasing her to retrieve gameplay transcripts with actions tagged as rude when possible.

The last video demonstrates a waitress directed to upsell. This is accomplished through a combination of applying an upselling attitude tag, and adding a couple domain-specific rules to the Domain Knowledge Manager which tell the waitress to never bring an entree until an appetizer has been ordered, and never bring the bill until dessert has been ordered.

WELL, DID IT SUCCEED?

I ran both a quantitative and qualitative study to evaluate whether the implemented system succeeded at supporting a more open-ended, player-driven experience. Subjects in the quantitative study interacted with the system with speech-to-text, based on Microsoft's speech recognizer. This study looked at how often the subject was able to find a dialogue option with the same meaning as what the player was trying to say, where the dialogue options were driven directly by recognized speech in one condition, and by the full Collective A.I. system in another condition (which can exploit context to generate relevant dialogue options, even when speech is misunderstood). Results show that subjects were able to find a satisfactory dialogue option 29% more often with the full system.

Quantitative study results.

I also looked at the ranking of the selected dialogue option. The figure below shows how exploiting context increases the likelihood that the desired dialogue option will appear higher in the list.

More quantitative study results.

Subjects in the qualitative study played three games in groups, followed by a focus group discussion. Each subject played about 10 minutes of Facade, The Restaurant Game, and Skyrim -- interacting with NPCs in a tavern as a control, as a reminder of the current state-of-the-art in industry. This study was risky, given that the other games are polished released products, while The Restaurant Game is not really even a game -- more of a proof-of-concept tech demo. None of the subjects had played Facade before, and most were captivated by it. However, it was encouraging that the discussions revealed that players did find The Restaurant Game to be more player-directed, and responsive to nuanced language, while dramatizing a restaurant narrative in cooperation with the player. Unprompted, subjects described The Restaurant Game as a sandbox. Some notable comments included:

“It felt like The Restaurant Game was trying to play along with the player. It just kind of roles with it.”

“Façade led you, The Restaurant Game lets you lead it.”

“What I noticed about The Restaurant is that it was trying to do more than Façade in the sort of AI actual interpretation of colloquialisms.”

WHERE DO WE GO FROM HERE?

My thesis has only scratched the surface of what's possible with crowdsourced data-driven interaction, and I see it as a starting point rather than an end. As a former game programmer, the fact that these characters can still surprise me by saying things I've never seen them say before, even after working with this data for years, is incredibly exciting, and hints at the possibilities for truly next generation characters, driven by massive collections of content.

Over the course of the PhD, I dabbled in a couple related side-projects, reusing The Restaurant Game platform -- I collaborated with the Personal Robots Group on Mars Escape, a game to capture data about human-robot interaction, and collaborated with the GAMBIT Game Lab on Improviso, which collects data about playing roles on the set of a low-budget sci-fi movie. But there is still much, much more to explore.

To that end, in case you haven't been following my recent Facebook and Twitter spam, I'm continuing to explore data-driven simulated role-playing through a new venture called Giant Otter Technologies. Follow our latest developments at @GiantOtterTech.

PhD? PhDone.

2013-02-11T15:15:00.000-05:00

I defended my PhD in December, and got the document all signed off in January. I will eventually write a "reader's digest" summary post (with figures and videos). But for now, for those interested, the document is online here:

Collective Artificial Intelligence: Simulated Role-Playing from Crowdsourced Data
http://web.media.mit.edu/~jorkin/papers/orkin_phd_thesis_2013.pdf

You can read the Abstract here:
http://web.media.mit.edu/~jorkin/research/jeff_orkin_phd_abstract.html

Game Development in the 21st Century: Meet my A.I. Team

2012-02-06T16:52:00.004-05:00

Over the past couple months, I've been working with a team of seven people to create thousands of variations of restaurant behavior and dialogue, drawing from our database of recorded games. (See my previous post for some background on the project). The interesting thing is that my team members don't know anything about A.I., they're not programmers, they don't have any previous game development experience. They are random people that I hired on the internet, with minimal vetting, and they're doing great work!

More after the jump...

Follow research updates on Twitter: @jorkin

Let me introduce my team:

The team is responsible for annotating game logs with four types of meta-data (events, event hierarchies, causal chains, and references), which they accomplish via custom browser-based Flash applications. This meta-data becomes the fuel that powers my new planning system to control interactive character behavior and dialogue. A programmer is still required, to implement critics -- small pieces of code which constrain when fragments of behavior can execute, but annotating meta-data composes the lion's share of the authoring effort.

I hired my team by posting a Data Entry job opening on oDesk. I asked applicants to annotate one sample file, and hired the first group of people who did a good job. My team was staffed within hours of posting the job, and has now completed annotating 1,000 game logs. It took them a total of 415 hours, which cost just under $3,000. They were working part-time, spread over a couple months, but if someone was doing this full-time (8 hours / day), 415 hours is about 52 days. So, divided among a team of seven, this work could have been completed in about a week (or a week and a half, assuming 40 hour work weeks). I still have a lot of work to do over the next few months to demonstrate that this approach results in more engaging, robust behavior, but the prospect is exciting of a practical, fast, affordable way to create characters capable of rich social interaction.

Watch on Vimeo: Example of Event Annotation.

Small Teams, Big A.I.

Gratuitous Angry Birds image.

The game industry is going through a renaissance period right now, where the trend toward bigger teams and bigger budgets is reversing, and tiny “Indie” teams of are creating big hits on mobile and online platforms. We’re seeing a return to the days when a few people, or even one person, can create an entire game, but the A.I. in these games tends to lack depth. Understandable, given that even the AAA development studios struggle to find enough A.I. programmers, and current approaches to A.I. are technical and time-consuming processes. Indie developers can (and often do) outsource creation of their art assets. In my research, I’m trying to refactor the development process such that A.I. behavior can be outsourced as easily as the art, in a reasonable amount of time for a reasonable cost, allowing a tiny team to populate their game worlds with dynamic, socially intelligent, conversational characters. Ultimately, the point of this work is not really to cut costs and create dirt cheap A.I., but rather to bring more (and a wider variety of) people into the development process, to collectively produce something that could not be achieved another way.

Regarding A.I. in indie games, there are some notable exceptions on the horizon -- games from small teams with deep A.I. Industry veteran Paul Tozour’s City Conquest is a tower defense RTS developed using genetic algorithms to balance the playing experience. Prom Week, developed by a team of PhDs at UCSC (studying with Michael Mateas and Noah Wardrip-Fruin), might be considered the spiritual successor to Facade. Prom Week promises a highly replayable gameplay experience, based on dynamic social interaction, but the project (perhaps wisely) abandons Facade’s natural language interface. I think that natural language input still offers an opportunity to give players an increased sense of autonomy, and am hoping to show that leveraging data recorded from thousands of players can support robust language understanding while preserving the player’s sense of agency.

What about Turk?

When I describe my approach as crowdsourcing, people often ask why I’m not using Amazon’s Mechanical Turk. Crowdsourcing purists might say that what I’m doing on oDesk is really outsourcing more than crowdsourcing, because I’m working with a persistent team (although the earlier phase of my project where we recorded players online was certainly crowdsourcing). I did experiment briefly with Turk, and my impression was that there are lots of scammers on Turk, trying to make money by clicking things as fast as possible, and a large part of the effort would need to focus on validating work. My research focus is really on building the system that generates behavior and dialogue from the annotated data, and crowdsourcing is a means to an end. There is more personal interaction on oDesk, and the reputation system provides an incentive to maintain high quality work, making it easier to find good people and continue working with them. My experience on oDesk could be considered a proof of concept for a process that could be repeated on Turk in the future.

Thesis Proposal Recap

2011-12-20T23:31:00.017-05:00

I'm long overdue for a new blog post. 2011 has been a busy year. I've been making a lot of research progress, which has kept me too busy to blog! I've hit a number of milestones in recent months -- my thesis proposal was approved in August, a prototype of my new planner came together in October, and I launched a crowdsourced data annotation effort in November. This post will briefly recap the proposal that I presented to my committee.

I'm building a new planner that combines plan recognition with case-based reasoning, to simulate reasoning from a collection of episodic memories. My approach is production-oriented, in that it is mindful of the content authoring bottleneck (where content is AI behaviors and dialogue lines), and attempts to make it as easy as possible to get an enormous variety of content into the system. The approach leverages the realities of the world we live in today, where it is possible to crowdsource repetitive tasks to non-experts, and opts for powering the system with lots of mundane data annotation rather than cleverly programmed AI.

For background information on this project, see the previous recaps: Part 1, Part 2, Part 3.

More after the jump...

Follow research updates on Twitter: @jorkin

INTRODUCTION

AI for combat has come a long way in the past 15 years, or so. How we can make other parts of the experience -- the social interaction, and storytelling -- as dynamic as the combat? While numerous games deliver sandbox-style combat that adapts to player choices, social interaction and storytelling is generally either entirely linear, or scripted with limited opportunities for interaction or influence. Game developers have mastered pathfinding, animation, scripting, and reactive behavior, but have made less progress producing characters that can dynamically communicate, cooperate, and maintain coherent interactions over long periods of time.

What about The Sims and Facade? These are exceptions that inspire my work, but lie at opposite ends of a spectrum, where I am targeting something in between. The Sims is entirely emergent; a doll house that does not attempt to tell any particular narrative. Facade tells a specific, coherent narrative, but has been criticized for limiting player agency.

Graphics Envy

It is often noted that AI has not kept pace with graphics in games. There was a time when every pixel on the screen was plotted by hand. Today we have arrived a representation that allows us to render 3D worlds at runtime from any camera position. And this representation scales -- with more processing power, we can render more polygons (millions of polygons!), yielding incredible detail.

Graph generated by recording 5,000 pairs of players.

We are essentially still in the pixel-plotting days of AI, where we are crafting every decision by hand -- an approach that does not scale beyond the complexity of behaviors that we see in current games. This graph (to the left) was generated by recording 5,000 pairs of humans playing as customers and waitresses in a virtual restaurant. The graph shows all of the action sequences observed from the start of the game (at the top), until the end of the game (at the bottom). Human behavior is complex, and nuanced. We will never be able to author such a dense possibility space by hand. We need a representation that can be recombined to generate all of these possibilities at runtime, adapting to player choices.

COLLECTIVE ARTIFICIAL INTELLIGENCE

I refer to my approach as Collective Artificial Intelligence -- a combination of crowdsourcing, pattern-mining, and episodic planning.

Crowdsourcing

There is no easier way to author behavior than through a live demonstration (i.e. playing a character in a game). Anyone can do it, without any technical know-how. I have collected three data sets of behavior: The Restaurant Game has recorded over 10,000 demonstrations, Improviso is currently recording actors on a virtual film set, and Mars Escape was a collaboration with the Personal Robots Group at the Media Lab, which recorded hundreds of demonstrations of a human-robot collaboration. All of these games were created from the same codebase.

The Restaurant Game, Improviso, and Mars Escape.

Recording thousands of people in the same scenario can be thought of as crowdsourcing our imagination, which is the first step in my process. In the next step -- pattern-mining -- crowd-sourcing is employed once again, to help interpret the data collected from the first step.

Pattern-Mining

Pattern mining is a human-machine collaborative process. Humans are recruited to annotate data with browser-based tools, explaining the meaning of different action sequences to the AI system. Including humans in the loop enables capturing sparse examples of behavior and dialogue that would have slipped through the cracks of statistical machine learning algorithms, and provides an opportunity for a designer to control which behaviors to capture and which to ignore. One of the goals of this work is to dymystify Game AI from being a black art, by refactoring a difficult problem into many simple annotation tasks. The intuition is that few people can program behavior, but anyone who speaks English can explain behavior given a transcript of a recorded game.

Each recorded transcript is annotated with four layers of meta-data: event sequences, event hierarchies, causal chains, and references. Embedding streamlined annotation tools in a browser makes it possible to take advantage of the numerous web sites that exist for hiring people online to perform unskilled labor (e.g. Amazon Mechanical Turk, CrowdFlower, eLance, oDesk). Below is a video demonstrating the process of annotating event sequences. More information about annotation is available in papers from INT3 and AIIDE.

Watch on Vimeo: Event Annotation.

Episodic Planning

Annotated data is the fuel that powers the new episodic planning system. The diagram below illustrates the machinery inside the mind of a character. In brief: the agent receives observations through sensors, and records the observed interaction history on the Blackboard. In order to understand and respond to observations, the agent can exploit Collective Memory -- a database of recorded transcripts, and associated meta-data. The Plan Recognizer leverages the learned Event Dictionary to infer an event hierarchy from the observed interaction history. The Action Selector then searches for recorded human games that match the inferred event hierarchy, and passes these as proposals to a set of Critic processes. Critics draw on a variety of sources to scrutinize the validity and coherence of following the next step in each proposed plan. If one of the proposals is approved by all critics, it is passed to the Actuator for execution.

Agent architecture for episodic planning.

Below is a video demonstrating the system. The bottom of the video illustrates the plan recognition process that is running in the Customer's mind as new observations arrive. While it would be easy to script something like this, keep in mind that the video is unscripted -- the two characters are dynamically responding to one another based on observed actions and natural language dialogue text. Each time the system runs, the scenario plays out differently. I am working toward a demo where a human can play one of the characters, but this requires much more annotated data to cover the space of possible behavior. The big difference between this system and my earlier statistical approach is that the new system always produces a coherent narrative, or if it doesn't, it's a bug that can be fixed rather than just some statistical anomaly.

Watch on Vimeo: Episodic Planner: first prototype.

EVALUATION

My committee approved my proposal, giving me an "uncondition pass"... with a condition. The condition is that I write a concise, focused, one-page plan for how I will evaluate the system. I am still thinking about this. I hope to quantitatively show that the system produces human-like interactions, when compared to thousands of human-human transcripts. But more importantly, I want to show qualitatively that this system produces a new experience; one that players find more engaging due to an increased sense of agency. I want to demonstrate that players feel that the AI character is cooperating, and helping the human player take the narrative in the direction the human chooses to go.

Now that my proposal has been approved, I am required to defend my thesis within 12 months of approval, so allegedly I will be finished by sometime in August 2012 at the latest!

Improviso: Month One Highlights, and now on Mac OSX!

2011-04-21T20:44:00.006-04:00

Improviso has been online for about one month, so I thought I would share some highlights so far.

The big news is that we released the Mac OSX version today. If you are still holding out for the iOS version, DON'T! There is not going to be an iOS version!! So download Improviso for Windows or OSX and play today: http://gambit.mit.edu/improviso

More after the jump...

We are grateful that the press has been supportive of the project, which has helped data collection get off to a good start. I would like to thank the folks who have posted these articles:

Improv game puts the role-playing back in RPG (MIT)
MIT GAMBIT Lab Releases Improviso Game With Goal Of Training AI (Gamasutra)
Freeware Game Pick: Improviso (IndieGames.com)
GAMBIT Game is Drama School for AI (Rock, Paper, Shotgun)
Look Out Watson MIT's New Video Game is Training Better AIs (Motherboard.tv)
Collective AI: A Conversation with MIT’s Jeff Orkin (H+ Magazine)

So what do people do in this game, you ask? I promise anyone who plays will have a unique experience, to say the least. Here are a couple transcripts of real players online. Special thanks to the members of the Facepunch forum thread who have been sharing their screenplays.

Sample Game #1:

---------------------------------------------------------------------
Scene 1/3: "Interrogate Ted"
---------------------------------------------------------------------

TED said to AGENT SMITH: HELO
TED said to AGENT SMITH: help
AGENT SMITH said to TED: Shut up and listen to me!
TED said to AGENT SMITH: but I'm stuck!
AGENT SMITH said to TED: You were caught in this crash zone!
AGENT SMITH said to TED: What are you doing in here!
TED said to AGENT SMITH: I got lost, then I was attacked!
AGENT SMITH said to TED: Yes
AGENT SMITH said to TED: That was me!
AGENT SMITH said to TED: And i tied you up!
AGENT SMITH said to TED: NOW SPEAK UP!
AGENT SMITH said to TED: Or else...
TED said to AGENT SMITH: Did you prob me too?
AGENT SMITH said to TED: I tied you down for a reason!

The soundtrack has been changed to Scary.

TED said to AGENT SMITH: I was looking for something interesting to report on
AGENT SMITH said to TED: Now... give me a good reason why i should let you out...

AGENT SMITH dropped the HANDGUN.
AGENT SMITH picked up the PROBE.

AGENT SMITH said to TED: Or i will use this probe on you!
TED said to AGENT SMITH: I'm an ex marine and can help fight the aliens
AGENT SMITH said to TED: Okay; i trust you for now. Lets go.

AGENT SMITH set TED free from the rope
AGENT SMITH dropped the PROBE.
AGENT SMITH picked up the HANDGUN.

AGENT SMITH said to TED: Pick up that probe and follow up!

TED picked up the PROBE.

---------------------------------------------------------------------
Scene 2/3: "Surprise!"
---------------------------------------------------------------------

AGENT SMITH said to TED: Now that we got out...
AGENT SMITH said to TED: You can explain yourself. Better now.
TED said to AGENT SMITH: Yes, yes I can, I feel much safer here
DR. STEIN said to TED: Let me just... what is that bump on your forehead??

DR. STEIN picked up the X-RAY.

TED said to DR. STEIN: oh nothing, you don't need to check it
DR. STEIN said to TED: You are wrong... let me xray this thing...

The soundtrack has been changed to Exciting.
DR. STEIN used the X-RAY on TED.

DR. STEIN said to TED: This isnt a normal bump... OH MY GOD! SOMETHING IS MOVING IN THERE
TED said to AGENT SMITH: OH NO

DR. STEIN dropped the X-RAY.
DR. STEIN picked up the TRANQUILIZER GUN.
TED has put on the K-17 mask.

TED said to DR. STEIN: YOU FOUND OUT!
DR. STEIN said to AGENT SMITH: OH NO! Its an alien!
AGENT SMITH said to TED: Drop dead; you skunk!
TED said to DR. STEIN: MY BROTHEREN WILL BE HERE SOON

AGENT SMITH attacked TED with the HANDGUN.
TED died
The soundtrack has been changed to Sad.

AGENT SMITH said to TED: I think we did it... Did we kill it?

DR. STEIN dropped the TRANQUILIZER GUN.
DR. STEIN picked up the SURGICAL TOOLS.
DR. STEIN used the SURGICAL TOOLS on TED.

DR. STEIN said to TED: Its dead. The only proof of life outside there.
AGENT SMITH said to DR. STEIN: Im sorry.

---------------------------------------------------------------------
Scene 3/3: "R.I.P. Ted"
---------------------------------------------------------------------

AGENT SMITH said to TED: Ted. why are you here?
TED said to AGENT SMITH: I cam to report a crash
AGENT SMITH said to TED: I know about this crash; I knew about it all the time...
TED said to AGENT SMITH: so you can help me with my report?
AGENT SMITH said to TED: Sure. what do you need help with... i can provide some... insights.

AGENT SMITH dropped the HANDGUN.
AGENT SMITH picked up the GUN.

TED said: This... this egg? what is it?

The soundtrack has been changed to Tense.

AGENT SMITH said to TED: Its my son...

AGENT SMITH has put on the TED mask.
AGENT SMITH has put on the K-17 mask.

TED said to AGENT SMITH: what's happening?!
AGENT SMITH said to TED: The only thing that is left to do now is to seal your hopeless fate...
TED said to AGENT SMITH: NOOOOOOOO

The soundtrack has been changed to Scary.
The DIRECTOR started the Lightning special effect.

AGENT SMITH said to TED: Prepare... to die....

The DIRECTOR started the Blast-off special effect.
AGENT SMITH attacked TED with the GUN.

TED said to AGENT SMITH: you'll regret this!

AGENT SMITH attacked TED with the GUN.
TED died

Sample Game #2:

---------------------------------------------------------------------
Scene 1/3: "Rescue Ted"
---------------------------------------------------------------------

AGENT SMITH said to TED: Ted Cosby?
TED said to AGENT SMITH: Get me out of here!
AGENT SMITH said to TED: Are you hurt?
TED said to AGENT SMITH: No, but I cant move
AGENT SMITH said to TED: Oh my, some kind of extra terrestrial binding mechanism
AGENT SMITH said to TED: A rope perhaps
AGENT SMITH said to TED: I'll need something to cut this
AGENT SMITH said to TED: Any ideas?
TED said to AGENT SMITH: Maybe the catfood will work!
AGENT SMITH said: Ah ha, a sharp edged can
AGENT SMITH said: You're as smart as they say

AGENT SMITH dropped the HANDGUN.
AGENT SMITH picked up the CAT FOOD.

AGENT SMITH said to TED: Hmm, why would aliens have cat food

AGENT SMITH set TED free from the rope

TED said to AGENT SMITH: Where do we go from here?
AGENT SMITH said: Let's get out of here, back to base!

TED exited the scene.

---------------------------------------------------------------------
Scene 2/3: "Doppleganger"
---------------------------------------------------------------------

K-17 said to TED: Stop right there
K-17 said to TED: This man is an alien imposter!
AGENT SMITH said to K-17: What?!
TED said to K-17: Oh god which one of you is real?

TED picked up the CAT FOOD.

K-17 said to TED: Isn't this obvious?
K-17 said to AGENT SMITH: How would he be on a god damned crashed spacecraft if he wasn't an alien being!
AGENT SMITH said to K-17: Then why is it you showed up JUST as I rescue Mr. Cosby?
TED said to K-17: Let me ask you both a question, why is there cat food on the ship?
K-17 said to TED: Ted, you have to believe me
K-17 said to TED: It's probably to feed their cat or something
AGENT SMITH said to TED: How should I know?

TED dropped the CAT FOOD.
TED picked up the PROBE.

K-17 said to TED: hey put that down
TED said to K-17: Why?
K-17 said to TED: It's... It looks fragile
AGENT SMITH said to K-17: Ha! Only an imposter would say that!
K-17 said to TED: I could say the same!

TED dropped the PROBE.
TED picked up the ALIEN EGG.

K-17 said to TED: NO!

TED dropped the ALIEN EGG.
TED picked up the GUN.

K-17 said: Put her down!
TED said: It must be you!

K-17 has taken off the AGENT SMITH mask.
TED attacked the HANDGUN with the GUN.
TED attacked K-17 with the GUN.

K-17 said to TED: Argh!

K-17 dropped the HANDGUN.
K-17 died

AGENT SMITH said to K-17: Phew
AGENT SMITH said to K-17: Thanks Ted, we'd better get out of here
TED said to AGENT SMITH: Alright, that was a close one.

---------------------------------------------------------------------
Scene 3/3: "Alien Carnage"
---------------------------------------------------------------------

DR. STEIN said to TED: Sorry about this, we just have to run some routine tests
AGENT SMITH said to TED: We'll have you right back out in no time at all

The DIRECTOR started the Explosion special effect.

AGENT SMITH said to TED: Argh!
TED said to DR. STEIN: Why do I have to be strapped down?
TED said to DR. STEIN: What was that!?
K-17 said to AGENT SMITH: Humans!

AGENT SMITH attacked the ALIEN POD with the HANDGUN.
K-17 attacked AGENT SMITH with their hands.
AGENT SMITH dropped the HANDGUN.
AGENT SMITH died

DR. STEIN said to AGENT SMITH: Smith!
TED said to AGENT SMITH: Oh no!
DR. STEIN said: Quick, help!

DR. STEIN set TED free from the straps
K-17 picked up the HANDGUN.

K-17 said to TED: Stop right there human scum!

TED picked up the SYRINGE.

K-17 said to TED: I will vaporise you!

K-17 attacked DR. STEIN with the HANDGUN.

DR. STEIN said to AGENT SMITH: Argh!

DR. STEIN died

TED said to DR. STEIN: No!

TED dropped the SYRINGE.

K-17 said to DR. STEIN: Now human
K-17 said to DR. STEIN: Tell me everything

DR. STEIN revived
DR. STEIN died
TED picked up the TRANQUILIZER GUN.

K-17 said to TED: No! Stop!

TED used the TRANQUILIZER GUN on K-17.
There is nothing selected to attack.
K-17 attacked the BRIEFCASE with the HANDGUN.

K-17 said to TED: This isn't the end, human!

K-17 dropped the HANDGUN.
K-17 died
TED dropped the TRANQUILIZER GUN.
TED picked up the SECRET FILES.
TED used the SECRET FILES on the PEDESTAL.

TED said: I will have the scoop of the year!

The soundtrack has been changed to Sad.

Play Improviso!

2011-03-16T17:03:00.006-04:00

Screenshot from Improviso.

PAX East just wrapped up, where we debuted our new data collection game: Improviso! Players of Improviso are paired anonymously online as the Lead Actor and Director of a low budget sci-fi movie. The Lead Actor plays Ted, a struggling reporter who wakes up tied-up on an alien spaceship. The Director selects scenes, summons special effects, and plays the supporting cast members: the G-Man, Scientist, Alien, and Little Girl. Improviso is a collaboration between the Media Lab and the Singapore-MIT GAMBIT Game Lab. Download Improviso yourself and play today!

More after the jump...

The best time to play Improviso, and find other players online, is between 7 - 10pm in your own time zone.

We announced the game at the Dialogue as Gameplay panel (read about the panel on Gamasutra and Newsarama), and it was playable at the GAMBIT booth. Download the game, and get more info on the GAMBIT Improviso page. The official press release is here. You can find additional background materials and concept sketches on the GAMBIT blog. Watch the trailer video below to get a sense of the gameplay.

Watch the trailer on Vimeo.

Gameplay in Improviso is similar to The Restaurant Game, where players are given a role to play through typed text and physical interaction in a 3D environment; however rather than teaching characters how to play a role in society (e.g. a customer or waitress), we will use the data from Improviso to teach characters how to play roles in a story. The Restaurant Game had a nice tie-in to Schank’s idea of the restaurant script. Here we are hoping to show that the same methodology can automate characters from recorded gameplay in a less mundane situation, that we wouldn’t assume has a shared cultural script, yet we expect to find some recurring behaviors drawn from our shared pop-cultural knowledge of science fiction.

Concept art for the alien.

It was a bit surprising how challenging it was to get players to do what we hoped they would do in Improviso, relative to The Restaurant Game. When you tell someone to play as a customer or waitress in a restaurant, they know exactly what to do without requiring any more information, but if you say “you’re a G-Man in Area 51,” most people are lost. Plus, a first-person game in a science-fiction setting, with guns and aliens, brings along certain expectations; expectations that are not aligned with the gameplay in Improviso.

So, the challenge with this game was figuring out how to get ordinary players, who might not have any experience with acting or writing, to engage in dramatic improv with strangers online. Below is the list of things we did, to finally get the results we wanted.

created a detailed tutorial.
provided examples of expected play (trailer video, and story boards in the tutorial).
began the game with the players in a situation that forces face-to-face interaction and dialogue (Ted is tied-up, and can’t even move until Agent Smith unties him).
provided very specific descriptions of what should happen in each scene.
confined players to specific regions of the set for each scene (e.g inside the spaceship, inside the Area 51 lab).
started scene one with only two characters, and gradually introduce additional characters in scenes two and three.
implemented mechanisms to enforce relatively short scenes, such as running out of film, based on a combination of time, the number of lines exchanged, and some randomness.

Much of this seems obvious in retrospect. Hindsight is 20/20. But in any case, it worked! We’ve observed almost 100 people engage in online improv. Many are getting into character, and even seem to be having fun. We’re excited to see what happens in the wilds of the internet.

This project would not have been possible without the support of GAMBIT. Improviso began as an 8-week summer project with a team of nine undergrads (seven from Singapore, plus one from both MIT and RISD). A GAMBIT grant allowed us to continue working on it for about three more months, with a smaller team of two programmers and one artist.

Recap Part 3: Generation from Collective Gameplay

2011-02-25T23:24:00.024-05:00

In 2008, I implemented my first version of a system that plans character behavior and dialogue from recorded gameplay data. This system combined the components described in Recap 1 and Recap 2 with a replay system, which automated a character by playing back fragments of log files (or one log in its entirety).

The planner is a bit like the Game AI equivalent of the Random Paper Generator. It essentially randomly wanders between 5,000 log files, stitching together fragments of behavior and dialogue at run-time, while maintaining local coherence by critiquing log transitions for statistical likelihood. This random stitching process is complicated by the fact that each character (customer and waitress) needs to observe the behavior of the other character, and select a sensible response from a log file.

More after the jump...

At a high level, here is what happens: each character selects an arbitrary log file, begins a replay, and continually critiques him/herself for the likelihood of executing the next action in the log (based on the n-gram model described in Recap 1). When the log indicates that other character does something, the character waits to observe the expected action. If the next observation does not match the expectation, the character looks for a new log file that is a better match for recent observations. While replaying physical actions, the next action might be a dialogue line, at which point the character toggles to the chat system (described in Recap 2). The dialogue interaction may eventually terminate with a physical action, leading the characters to toggle back to the physical replay system, and so on.

I presented a paper at AAMAS 2009 in Budapest covering the details of this system. I had the good fortune of having this paper summarized by Michael Mateas at the GDC 2010 Game Studies Download 5.0 panel. Michael describes the work more clearly than I can describe it myself – see for yourself.

Incidentally, while wandering around Budapest, I found Cate Archer painted on a wall outside a bar. Who knew the Hungarians were such big fans of NOLF?

Planner Demos

Below are some videos demonstrating the planner in action.

The videos are more interesting if you keep a couple things in mind:

All of the characters’ behavior and dialogue comes directly out of recorded human-human games (typos and all).
The first two videos show two AI characters interacting. However, there is no centralized control. Each character has a separate pool of log files to draw from, and they are responding to one another dynamically in real-time based on observed actions, state changes, and raw chat text. Their AI processes are running on an AI server, and each character could be running on an entirely different machine.

This is a video of one of the better runs:

AAMAS 2009 Clip 1 on Vimeo.

Here is an average run:

AAMAS 2009 Clip 2 on Vimeo.

Then, there were also many runs where the characters go to the bar get stuck in an infinite loop of ordering beer. You don’t need to see those.

The characters can interact with other characters, or with humans. Here is a video of an AI waitress interacting with a human customer. It works fairly well as long the human behaves very cooperatively. You can see that the system does not yet have any implementation of long-term memory. Towards the end of the video, there are some examples of the AI responding to… less ordinary interactions:

AAMAS 2009 Clip 3 on Vimeo.

The Good, the Bad, and the Ugly

As a game developer, having spent a decade programming behavior by hand, it was very exciting to see that these characters could do anything remotely human-like without any hand-programming at all. But there are obviously some serious drawbacks to this system – primarily the complete lack of designer control, and the related fact that characters do not always do the right thing. Below are some cool things about this system, and some not-so-cool things.

Cool Things:

High-level behavior (dialogue and decomposable actions) runs on an AI server networked with game (via sockets). Low-level behavior (pathfinding, animations, locomotion) is implemented in a layer integrated with the game engine. For example, the AI server sends a command to the waitress to pickup a steak from the kitchen, and the low-level game-side layer navigates to the kitchen, resolves the reference to a specific steak in the game world, and selects the animation to pick it up.
Characters can be running on different servers. They only communicate within the game world by observing one another’s actions, state changes, and raw chat text. Characters observe and respond to humans through the same machinery used to respond to other AI characters
No hand-authoring of high-level behaviors.
Interactions play out differently every time.
The system handles both physical interaction and natural language dialogue.

Not-So-Cool Things:

No designer input. No authorial control.
Sometimes characters do wrong, weird, or out-of-context things.
Behavior is guided by statistical regularities, but sometimes the most interesting behaviors are statistical outliers. These outliers get filtered out, for better or for worse.
Constraining behavior with a trigram model ensures local coherence, but not global coherence -- characters tend to do sensible things from moment-to-moment, but over longer periods of time they can get caught in cycles.

Smart Cookie

So, implementing this system was an informative foray into statistical modeling of language and behavior, but soon after publishing this work, I received some poignant words of wisdom from a fortune cookie: "Don’t let statistics do a number on you." Statistics are useful for mining recurring patterns in large datasets of gameplay data, but statistics alone cannot replace game designers -- ultimately there needs to be a human in the loop to identity behaviors of interest and to make sense of the higher-level structure of the scenario. Noah Wardrip-Fruin’s Expressive Processing has a chapter about statistical AI that highlights The Restaurant Game, and comes to this same conclusion. My current research aims to leverage statistics while keeping a human designer in the loop, forming a powerful human-machine collaborative authoring process.

That wraps up the recaps. The rest of 2009, after AAMAS, was kind of the dark period of my PhD, in terms of productivity. I spent a lot of time working on a dialogue act classifier (described in this paper) that I ultimately have not ended up using in my new system. And then over six months was devoted to intense reading for my General Exams. Future posts will begin describing components of my new system, which is still in development, and aims to address shortcomings of the first version of the planner, with the concession that is no longer a fully automated authoring system (but I think that is for the best, really).

Recap Part 2: Dialogues with Data

2011-02-02T19:10:00.004-05:00

Welcome back readers. Another snow-day in Boston; good time to update the blog! In this interactive post I will continue my recap of early experiments in automating characters with recorded gameplay data. You will find a demo to play with at the end.


Human waitress chats with an AI customer, trained with 5,000 games.

After completing the Masters, I was anxious to get something truly interactive up and running as quickly as possible. I thought I would start by looking at the chat text data, and I was inspired by reading a paper about the How May I Help You system at AT&T, an unsupervised system that learned correlations between salient phrase fragments and call routing decisions. I wondered if we could take a similar approach to learning patterns within chat text exchanges based on salient word sequences for different contexts within the restaurant scenario – e.g. getting seated, ordering, paying the bill, etc – and use them to automate a chat bot.

More after the jump….

The implementation of the chat bot was brain-dead simple, yet results were often surprisingly effective. I started by segmenting the log files into conversations, where a conversation is one or more uninterrupted lines of chat text sandwiched between physical actions (e.g. picking up steak from the table, sitting on a chair, or using the cash register). For each context, I extract a list of all word sequences that occur in at least five games. These sequences are then used to encode lines of chat text in the log files into signatures that can be quickly compared with chat text input at runtime.

The chat bot encodes the user’s text input as a signature, searches for the best matching signatures in 5,000 log files, and then simply outputs the next line from the conversation with the best matching signature. Ties are broken arbitrarily, and there is a history bias to favor conversations that contain previous lines that match the history of the interaction currently in progress.

I want to stress that this chat bot was just a quick experiment, and is not the system I am using in my more recent work. However, this simple implementation turned out to be quite a compelling (and entertaining) way to explore the data, and to illustrate how far you can get without any semantics, just relying on surface text similarities, if you have lots of data. The results are not perfect – not going to pass the Turing test, but compared to other chat bots, it’s not doing so bad, especially considering it requires zero hand-authoring -- just turn the crank and get a chat bot. I never formally evaluated the chat system because it was just an exploration, but some implementation details are covered in this AAAI symposium paper.

I ended up demo-ing this system for years at presentations and lab events. Running it is a bit like Russian roulette. It often works well and impresses users, but there is always the chance of an incorrect, or even obscene, response. The data is full of witty responses and handles a number of cases that I would not have covered if authoring the dialogue by hand. Below are some highlights.

Highlights

Here is one example of the bot’s wit. In all examples, the bot's response is in bold:

WAITRESS: Welcome to the restaurant

CUSTOMER: Hi

WAITRESS: hi, how many in your party?

CUSTOMER: a table for one please.

WAITRESS: smoking or non?

CUSTOMER: WOW, you have a smoking section?

CUSTOMER: I don’t smoke, but I will start tonight

CUSTOMER: take me there!

And another:

WAITRESS: hi

CUSTOMER: hi

WAITRESS: do you have a reservation?

CUSTOMER: I have personal reservations

CUSTOMER: but I wave them when it comes to lust.

When interacting with a Japanese visitor to the lab, we were both surprised when the bot corrected his English:

WAITRESS: welcome

CUSTOMER: table for one please

WAITRESS: have a sit

CUSTOMER: *seat

Sometimes the response is not quite right, but still entertaining – better than no response at all. (Note that in this case, the bot is playing as the waitress):

CUSTOMER: you have nice eyes

WAITRESS: thanks! :) I work out

Edge Cases

The most interesting thing about this data-driven approach is seeing the edge cases that get picked up. When I noticed that one of the files of extracted phrases looked suspiciously small, and investigated which context it was associated with, I found that the system had learned to say “oops” when the waitress dropped something on the floor. So what, right? That’s obviously what you should say when you drop something. Well, what makes this interesting is that the user interface actually does not allow the players to put things down on the floor – they can only put things down on furniture and other objects. Dropping things on the floor occurs as the result of a ray-casting bug, when trying to place something on a table. In this case, the AI system has learned an appropriate response for something the designer never realized could even happen!

Another edge case is related to a fruit bowl in the back of the kitchen. In many games, decorative props like these would simply be ignored by the AI. In contrast, the chat system learns that a waitress should say “on the house” when putting the fruit down on a customer’s table. One of the down sides of learning from recurring patterns of text is that the system fails to pick up lines that might be gems, but are only observed once in thousands of games. For example, in one game the customer responds “Damn girl, that is serious fruitage!” when the waitress puts the fruit bowl on his table. More recent work is looking at ways to capture these gems by including a human in the loop of the data-mining process.

Colorful Interactions (to say the least)

As mentioned earlier, the chat bot does have the potential to offend people. At a presentation in Plano, Texas, the bot offered Paul Tozour a lap dance. When demo-ing at the lab for the VP of a major American corporation, she was surprised to see this:

WAITRESS: How may I help you?

CUSTOMER: Get me a table b****

I have to admit getting some satisfaction from a demo for a gray-haired reporter who told me my research didn’t make any sense to anyone over the age of 40, that went like this:

WAITRESS: welcome

CUSTOMER: shut up wh***

I don’t think he actually noticed the bot’s response – couldn’t read it through his bi-focals. Obviously we can censor obscene words, but where’s your sense of adventure? Players can still say some pretty bad things without swearing. Microsoft learned this the hard way when they released a potty mouthed santa clause bot.

Interactive Demo!

I wanted to embed the chat bot applet into this blog post, but Java’s security regime defeated me, and I had to put on its own page. You can find the demo applet here.

First post. w00t!

2011-01-15T20:59:00.000-05:00

People of Earth:

Welcome to my research blog! I’m in the home stretch for the PhD, aiming to graduate in about a year (plus or minus a few months), and will use this blog to share my progress, and recap some earlier developments. I’m hoping that blogging will help me collect my thoughts for the defense (and help me figure out where all the time went).

Screenshot from The Restaurant Game

I launched The Restaurant Game in early 2007, and proposed to generate character behavior and dialogue from log files of human-human interactions recorded online for my Masters thesis. Needless to say, I grossly underestimated how hard it would be to automate behavior and dialogue from unconstrained online interactions (at least, to do it well enough for use in a ‘real’ game), and I’m still working on this for my PhD.

It’s been a pretty crazy project, as you might expect when you invite the general public on the internet to participate in your research.

More after the jump...

Recap Part 1: Data Collection

I really had no idea if anyone would play the game, and if they did, would they do anything reasonable? I hoped to collect 1,000 transcripts of completed two-player games, which I reached within two weeks, so I raised my goal to 10,000, which I finally reached in 2010. Obviously a commercial game could easily collect this much data in week, or even a day. The biggest spikes in traffic came from mentions on Voodoo Extreme and Kotaku, and (strangely spontaneously) years later on h+ Magazine and Slashdot. Here is the google analytics report of traffic to the project web page between 2007 and 2010.

A staircase composed of 53 cherry cheesecakes.

The anonymous players on the internet did not disappoint – they did all kinds of things in the game. Some stacked 53 cherry cheesecakes to form a staircase, and climbed onto the roof of the restaurant. Other players punk'd a reporter from NPR. Hackers in Sweden figured out how to run their own servers, and began modding the game. They introduced naked customers, allowed more than two players, and eventually built a two-story restaurant, and (of course) added guns.

Swedish hackers mod The Restaurant Game.

People misbehaving online was not surprising. What was surprising was that over half of our players did mostly normal restaurant things – eating food, serving drinks, reading menus, paying bills, etc. For my Master's thesis (summarized in this journal article), I used data from 5,000 games to learn a statistical model of language and behavior that could be used to estimate the typicality of any gameplay session (or fragment of a game). I had humans rate the typicality of 300 games based on what one would expect to observe in a real restaurant (in terms of both dialogue and physical interaction), and demonstrated that the estimates of the statistical model for the same games correlated strongly with human judgment. So for the thesis deadline in 2007, I ran out of time before I was able to close the loop and actually generate interactive behavior from the data, but even recognition has interesting implications for games. (Future posts will describe more recent work, where I do generate dialogue and behavior from the data).

What good is behavior recognition? Characters could use statistical models learned from gameplay data to detect someone behaving atypically, or to guide their own behavior to blend in with the crowd -- kind of a social variant on stealth gameplay (e.g. Thief, NOLF). It’s interesting to see that there is already a new genre of multiplayer stealth games emerging, where humans try to blend in with AI in Spy Party and the multiplayer mode of Assassin’s Creed: Brotherhood. Statistical models would allow this type of gameplay in single-player games, where the AI is trying to detect the human. Obviously, the AI already knows who the human is, but this is no different than when we simulate vision and hearing to enable sensory honesty in stealth games. Think of it as a means of enabling social sensory honesty.

That’s it for now. Future recap posts will get into simulating interaction from data. Stay tuned! Or even subscribe.