tag:blogger.com,1999:blog-82214474446993683662024-03-04T23:07:57.211-05:00Collective A.I.Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-8221447444699368366.post-77988894991427312722013-08-13T10:31:00.001-04:002013-08-13T10:31:35.293-04:00The Adventure ContinuesI am continuing to work on Collective A.I. through a company I co-founded called Giant Otter Technologies, inc. We will be blogging from <a href="http://giantotter.com/blog/">http://giantotter.com/blog/</a>, as well as posting updates on Twitter: <a href="https://twitter.com/GiantOtterTech">@GiantOtterTech</a>. See you there.<br />
<br />
<br />
<br />Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com1tag:blogger.com,1999:blog-8221447444699368366.post-37713097054718895762013-07-03T23:52:00.000-04:002013-08-04T14:29:10.502-04:00Defense AbridgedI've been meaning to blog a reader's digest version of my PhD
thesis defense since December.<span style="mso-spacerun: yes;"> </span>Now that six
months have passed, it's about time to follow through with that plan.<span style="mso-spacerun: yes;"> </span>You can also watch my <a href="http://www.gdcvault.com/play/1018039/Off-the-Beaten-Path-Non">AI Summit talk</a> from the
GDC Vault (starts at 20:45), which is a condensed version of my defense, minus the study results. The complete thesis document is available here: <b><a href="http://web.media.mit.edu/~jorkin/papers/orkin_phd_thesis_2013.pdf">Collective Artificial Intelligence: Simulated Role-Playing from Crowdsourced Data</a></b><br />
<div class="MsoNormal">
<br />
<a name='more'></a><br /></div>
<h3 class="MsoNormal">
PREFACE</h3>
<h2>
</h2>
<h2>
</h2>
<div class="MsoNormal">
I had a great committee -- my PhD advisor Deb Roy (<a href="https://twitter.com/dkroy">@dkroy</a>), along
with Nick Montfort (<a href="https://twitter.com/nickmofo">@nickmofo</a>), and Mark Riedl (<a href="https://twitter.com/mark_riedl">@mark_riedl</a>).<span style="mso-spacerun: yes;">
</span>In an attempt to give my thesis some industry relevance, I invited Gabe
Newell to join the committee as well.<span style="mso-spacerun: yes;"> </span>To
my surprise, he agreed and participated in the proposal phase.<span style="mso-spacerun: yes;"> </span>Gabe's proposal feedback greatly motivated my
direction moving forward from the proposal to the defense.<span style="mso-spacerun: yes;"> </span>In part he said, "<b>You will fail</b> at
creating a greater sense of player contingency."<span style="mso-spacerun: yes;"> </span>My grand plan backfired -- one of my game
industry heroes was predicting failure at exactly what I was intending to
achieve!<span style="mso-spacerun: yes;"> </span>This skepticism actually served
me well, and pushed me to think through some big issues.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Prior to my proposal, I had focused on using data recorded
from thousands of players to automate AI characters who could dynamically
converse and interact with other AI characters.<span style="mso-spacerun: yes;">
</span>But my ultimate goal for this research was to support AI dynamically
interacting and conversing with <i>human </i>players, and to show how data-driven
interaction can support a vastly more open-ended, player driven
experience.<span style="mso-spacerun: yes;"> </span>So, catalyzed by Gabe's
feedback, after the proposal I shifted 100% of my energy toward demonstrating
how data from <i>The Restaurant Game</i> could support unscripted face-to-face social
interaction and dialogue between a human player and an NPC.</div>
<div class="MsoNormal">
<br />
<br /></div>
<h3 class="MsoNormal">
HERE WE GO.... </h3>
<h3 class="MsoNormal">
</h3>
My thesis looks at games as a storytelling medium. Every medium allows telling stories in different ways. As Rockstar's Dan Houser said to the New York Times, "Books tell you something, movies show you something, games let you do something." The videogame industry has made a lot of progress in allowing the player to do things physically -- players can run anywhere, drive anywhere, and shoot at whatever they want, but has made much less progress supporting open-ended social interaction and dialogue. In general, we're still stuck with the same pre-scripted, multiple choice dialogue trees we've been seeing for 30 years, limiting the players' ability to express themselves and guide the storytelling experience.<br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd8OLEg6ybIzUGslXttBZs2BOyO41WbsJbzF-RYk3C6U8s3kEHMW_IKGVvWOhrgGe2n7mjEE5n-o5a2GNKaWWtnCqluXi8t1FCeNUqe0IMUAH9BAqLgnzP4Vf_7l589nr7168TN3JN9UQ/s650/masseffect3.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd8OLEg6ybIzUGslXttBZs2BOyO41WbsJbzF-RYk3C6U8s3kEHMW_IKGVvWOhrgGe2n7mjEE5n-o5a2GNKaWWtnCqluXi8t1FCeNUqe0IMUAH9BAqLgnzP4Vf_7l589nr7168TN3JN9UQ/s400/masseffect3.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Multiple-choice social interaction in Mass Effect 3</td></tr>
</tbody></table>
<br />
There are two obstacles preventing us from creating more open-ended experiences. The first is the content authoring bottleneck -- creating character behaviors is a technical, labor-intensive process, and authoring tools are relatively primitive. The second obstacle may be the bigger issue: human imagination is a limited resource. No matter how talented your designers and programmers are, each individual can only anticipate so many possibilities. In order to support more open-ended interaction, we need to rethink the way we author character behavior and dialogue. In particular, we need to move toward more data-driven approaches in order to scale up the interaction. To explore this, I launched <i>The Restaurant Game</i> in 2007, which anonymously paired 16,000 people online to play the roles of customers and waitresses. Players could say anything they wanted to each other (via typed text), and interact with the 3D environment via a point-and-click interface. We recorded everything, and could extract a discrete action sequence from each gameplay session. The question is, how can we exploit thousands of these action sequences to support open-ended interaction? And what will that experience be like for the player? Answering these questions could not only impact games, but also have implications for online education and training, and social robotics. <br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjn1QqRGCYyHkuLJO45lGIGnO3N3f75EaKlwKleIR3FMAcguxVphNPaBXepFMAom6G6sposz4o4y_JqfIA7pw1f29h-xm8OeXSjnQ5MDV_s5ABMFWQW1bnd4ZnJ786r-TuftKEB_UgIRdo/s700/facade.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="275" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjn1QqRGCYyHkuLJO45lGIGnO3N3f75EaKlwKleIR3FMAcguxVphNPaBXepFMAom6G6sposz4o4y_JqfIA7pw1f29h-xm8OeXSjnQ5MDV_s5ABMFWQW1bnd4ZnJ786r-TuftKEB_UgIRdo/s400/facade.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Open-ended natural language input in Facade.</td></tr>
</tbody></table>
<br />
There are a few examples of games that have ventured beyond multiple choice dialogue -- notably <i>Facade</i>, which very much inspired my own work. <i> Facade</i> was released in 2005, as I was wrapping up work on <i>FEAR</i>. While I was focusing on simulating action-packed combat, <i>Facade</i> was delivering drama by simulating social interaction in the mundane setting of a yuppy couple's apartment, and I was blown away. As Grace and Tripp start bickering, the player can type anything they want, to try to defuse the situation, or stoke the fire. <i>Facade</i> can't understand everything the player types, but elegantly designs around the hard language understanding problems -- when input is not understood, Grace and Tripp just continue to argue, which brilliantly succeeds in making the player feel like an awkward third wheel. But this is not a general solution. How can we support interactions between two characters, rather than three, where the player is face-to-face with an NPC, and everything s/he says matters, and cannot be ignored? How can the player use language effectively to navigate the story space?<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrxX1qZgLSO48fd0vCEZu8cha1pCLVzd8kDpXeIHH1U7fND_D4N_x_hvASveNpSsnhGHOeExgFZGyyQg4vVPCDQGYtqqathfinKdWHqW519KTayxXgtFEQJ2usM4ct05oALEVRiJ_v6QA/s826/space_2d.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrxX1qZgLSO48fd0vCEZu8cha1pCLVzd8kDpXeIHH1U7fND_D4N_x_hvASveNpSsnhGHOeExgFZGyyQg4vVPCDQGYtqqathfinKdWHqW519KTayxXgtFEQJ2usM4ct05oALEVRiJ_v6QA/s400/space_2d.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Tension between freedom and system comprehension, represented as a 2D space.</td></tr>
</tbody></table>
<br />
We can think of the interaction problem as a two-dimensional space, where the Y-axis indicates how much freedom the player has to say and do things, and the X-axis indicates how well the machine can understand what the player is saying or doing, and respond appropriately. Commercial games, like <i>Mass Effect 3</i>, sit close to the X-axis -- the machine can understand almost everything they do, because their freedom is so constrained. <i>Facade</i> is somewhere in between, giving much more freedom, at the cost of system comprehension. The holy grail is the top-tight corner, where the player has complete freedom, and the machine understands everything. My research is aiming for something closer to that holy grail.<br />
<br />
<br />
<h3 class="MsoNormal">
SO, WHAT DID I BUILD?</h3>
<br />
<div class="MsoNormal">
To get closer to that holy grail in a practical way, I've been experimenting with a hybrid
interface, where the user can say anything they want (typed, or
speech-to-text), and when an exact match for the input does not exist, the
system dynamically generates dialogue options intended to be semantically
similar and contextually relevant.<br />
<br /></div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZ5sjH9Tzs5Pag919t2WJM4mt9TXcUfL4ToCkZHpGGDQq7qWsx1IhkiLkc_Bxq5imZ2SRN3S4PrhdG8pWIe-d2Vs69WvQE9JQnSqOVsiDzmWSclR-P9WDj1mYxUKo9vB2jns9pEdszmF0/s826/trg.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZ5sjH9Tzs5Pag919t2WJM4mt9TXcUfL4ToCkZHpGGDQq7qWsx1IhkiLkc_Bxq5imZ2SRN3S4PrhdG8pWIe-d2Vs69WvQE9JQnSqOVsiDzmWSclR-P9WDj1mYxUKo9vB2jns9pEdszmF0/s400/trg.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Hybrid user-interface in <i>The Restaurant Game</i>.</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
The underlying system that drives the behavior and dialogue
of the NPC, in response to human interaction, relies on an approach I refer to
as <b><i>Collective Artificial Intelligence</i></b>, which consists of three steps:</div>
<ol>
<li> Record thousands of people playing roles in some
scenario.</li>
<li>Mine gameplay data for patterns of language and behavior.</li>
<li>Replay fragments of recorded behavior at appropriate
times at runtime.</li>
</ol>
<br />
<div class="MsoNormal">
Below is a graph generated by plotting all action sequences
observed in 5,000 gameplay sessions of <i>The Restuarant Game</i>.<span style="mso-spacerun: yes;"> </span>Each node represents a unique action, and all
games progressed from node "Start" at the top to node "End"
at the bottom.<span style="mso-spacerun: yes;"> </span>This image illustrates
that human behavior is complex, and nuanced, and far beyond what we can encode
by hand.<span style="mso-spacerun: yes;"> </span>I spent a couple years looking
at various ways to automatically mine patterns in this data (n-grams, SVMs,
HMMs, affinity propagation, PLWAP), and made some encouraging progress, but
ultimately concluded that these approaches have the effect of filtering out the
nuance of the interaction due to sparse data.<span style="mso-spacerun: yes;">
</span>But the motivation of recording thousands of people in the first place
was to capture the nuance!<span style="mso-spacerun: yes;"> </span>So, in 2010 I
changed direction, toward a human-machine collaborative approach, where humans
are employed to interpret the meaning of patterns in the data.</div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMPTrjnD-aWGB9qbJB6vUq8hjeYjO0SJhqKoUvDuRGWIO98rRQqoZ4zonATjcgnbArGoMtwNMyoGguebXJoFSkhp_34dnCJ0JGvjm8VrlPfKjhMgJxtoH5P_UdzoTzQQnNQezR1m58O4Q/s826/graphjpg.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMPTrjnD-aWGB9qbJB6vUq8hjeYjO0SJhqKoUvDuRGWIO98rRQqoZ4zonATjcgnbArGoMtwNMyoGguebXJoFSkhp_34dnCJ0JGvjm8VrlPfKjhMgJxtoH5P_UdzoTzQQnNQezR1m58O4Q/s400/graphjpg.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Graph of action sequences observed in 5,000 two-player games.</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
I created browser-based tools (Flex, ActionScript 3) and
used oDesk to hire people from the Philippines, Pakistan, India, and the U.S. to annotate data, applying a narrative structure that represents
a hierarchy of events, long-range dependencies indicating causal chains and
references, and expressions of attitude.<span style="mso-spacerun: yes;">
</span>This structure also represents modulation of affinity and tension, but
these aspects have not yet been implemented.<span style="mso-spacerun: yes;"> </span><br />
<br /></div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOvku8Qd39qi9xF_VoxAO_v3n7SADDtDn0HvdyX7IbesfYucHPTNZxXK_PXITYNlL4K8ZHLrIu1e71_LaFyJAlvxR_IhH1fC-dcg451cVar0KtZvF-BB9TrFb-xyNHDEyy9zP5U3vnTDY/s826/structure.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOvku8Qd39qi9xF_VoxAO_v3n7SADDtDn0HvdyX7IbesfYucHPTNZxXK_PXITYNlL4K8ZHLrIu1e71_LaFyJAlvxR_IhH1fC-dcg451cVar0KtZvF-BB9TrFb-xyNHDEyy9zP5U3vnTDY/s400/structure.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Narrative structure applied to gameplay transcripts.</td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
It took a team of seven outsourced annotators a total of 415
person-hours to tag 1,000 transcripts with four types of meta-data, which works
out to about 1.5 weeks, if they were working 40 hours/week, and it cost about
$3,000.<span style="mso-spacerun: yes;"> </span>As a final step, lines of dialogue
are manually semantically clustered.<span style="mso-spacerun: yes;"> </span>I
did this step myself, and it took about two weeks to cluster 18,000 lines.</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/8y69ZX3c-kI?feature=player_embedded' frameborder='0'></iframe></div>
<br /></div>
<div class="MsoNormal">
The annotated data serves as <i>Collective Memory</i>, driving the
decisions of the runtime planning architecture (written in Java), which
combines plan recognition with case-based planning.<span style="mso-spacerun: yes;"> </span>At a high level, the agent recognizes
discrete sequences of observations representing events, infers a hierarchy of
events, and retrieves gameplay transcripts (aka cases) containing event
hierarchies that are similar at an abstract level. <span style="mso-spacerun: yes;"> </span>Retrieved cases are critiqued, leveraging
meta-data to scrutinize a proposed next-action for coherence.<span style="mso-spacerun: yes;"> </span>For a simple example, if someone ordered
steak, and the AI waitress is considering a next action for serving pie, a
critic will reject this proposal due to violating a long-range dependency
tagged by a human, where ordering steak causes the waitress to serve a steak,
rather than pie.<span style="mso-spacerun: yes;"> </span>All of the critics are
domain-independent, with the exception of the <i>Domain Critic</i>, which accesses
rules encoded in the <i>Domain Knowledge Manager</i>.</div>
<div class="MsoNormal">
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQyt16376TxewCKZgbJ4BdSS9z9BTT55C_Ra33BNgKQt1JHC4AFD9FcLTaWZM0Invo7Pi5wxeNtXBs5Clddzuxri-In7Ma6ggf2_dJksfaIbuo7s5vWL08Hg9t83HII8wbBmBqd-mVKaA/s826/arhitecture.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQyt16376TxewCKZgbJ4BdSS9z9BTT55C_Ra33BNgKQt1JHC4AFD9FcLTaWZM0Invo7Pi5wxeNtXBs5Clddzuxri-In7Ma6ggf2_dJksfaIbuo7s5vWL08Hg9t83HII8wbBmBqd-mVKaA/s400/arhitecture.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Runtime planning architecture for an agent.</td></tr>
</tbody></table>
<br /></div>
<h3 class="MsoNormal">
DEMOS!</h3>
<br />
<div class="MsoNormal">
So, enough jibber jabber.<span style="mso-spacerun: yes;">
</span>Let's see what this system actually does.<span style="mso-spacerun: yes;"> </span>Below are three videos of a human customer
interacting with an AI waitress.<span style="mso-spacerun: yes;"> </span>The
first video highlights how the system auto-completes the same input in
different ways depending on context, and how the waitress can exploit player
data to respond to some of the more unusual things the player does.<br />
<br /></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/5VvZYwt2NmQ?feature=player_embedded' frameborder='0'></iframe></div>
<br /></div>
I think of these data-driven characters as
improvisational actors, who can take direction at a high-level.<span style="mso-spacerun: yes;"> </span>This video demonstrates directing the
waitress to be rude, which has the effect of biasing her to retrieve gameplay
transcripts with actions tagged as rude when possible.<br />
<br />
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://i1.ytimg.com/vi/rN_6q_4g6bA/0.jpg" height="266" width="320"><param name="movie" value="http://www.youtube.com/v/rN_6q_4g6bA?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="http://www.youtube.com/v/rN_6q_4g6bA?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div>
<br /></div>
<div class="MsoNormal">
The last video demonstrates a waitress directed to
upsell.<span style="mso-spacerun: yes;"> </span>This is accomplished through a
combination of applying an upselling attitude tag, and adding a couple
domain-specific rules to the <i>Domain Knowledge Manager</i> which tell the waitress
to never bring an entree until an appetizer has been ordered, and never bring
the bill until dessert has been ordered.<br />
<br /></div>
<div class="MsoNormal">
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/54VZAEnL9Lo?feature=player_embedded' frameborder='0'></iframe></div>
<br /></div>
<h3 class="MsoNormal">
WELL, DID IT SUCCEED?</h3>
<br />
<div class="MsoNormal">
I ran both a quantitative and qualitative study to evaluate
whether the implemented system succeeded at supporting a more open-ended,
player-driven experience.<span style="mso-spacerun: yes;"> </span>Subjects in the
quantitative study interacted with the system with speech-to-text, based on
Microsoft's speech recognizer.<span style="mso-spacerun: yes;"> </span>This
study looked at how often the subject was able to find a dialogue option with
the same meaning as what the player was trying to say, where the dialogue
options were driven directly by recognized speech in one condition, and by the
full <i>Collective A.I.</i> system in another condition (which can exploit context to
generate relevant dialogue options, even when speech is misunderstood).<span style="mso-spacerun: yes;"> </span>Results show that subjects were able to find
a satisfactory dialogue option 29% more often with the full system.<span style="mso-spacerun: yes;"> </span><br />
<br /></div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOZricuj28rDfcj1KjMpJ7hu8auFGruzqGJfH6VXsxS5qWbwbP84iakyriQ8QDMRcGs2BtgeqAvDKDNZ6vt4BO7_pAlmDV42W13SXElbWG3bl97vLfWal0-kdIbqyh6NmAlV5B4c7gpuc/s826/results_quant.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOZricuj28rDfcj1KjMpJ7hu8auFGruzqGJfH6VXsxS5qWbwbP84iakyriQ8QDMRcGs2BtgeqAvDKDNZ6vt4BO7_pAlmDV42W13SXElbWG3bl97vLfWal0-kdIbqyh6NmAlV5B4c7gpuc/s400/results_quant.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Quantitative study results.</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
I also looked at the ranking of the selected dialogue
option.<span style="mso-spacerun: yes;"> </span>The figure below shows how
exploiting context increases the likelihood that the desired dialogue option
will appear higher in the list.</div>
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2TiIlAqBQahKkLcG_twZfOIJPwlyr6q78Fx6xp3b_P_3UnkzELVbAsWS6h3H0h7zmUryqd1yUTyS91wIWzZGzntIaZ7cMmOLdSpOyFSeTGeGzH65voFSkEXA0pKRqarmJG-JeENn409s/s826/rank.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2TiIlAqBQahKkLcG_twZfOIJPwlyr6q78Fx6xp3b_P_3UnkzELVbAsWS6h3H0h7zmUryqd1yUTyS91wIWzZGzntIaZ7cMmOLdSpOyFSeTGeGzH65voFSkEXA0pKRqarmJG-JeENn409s/s400/rank.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">More quantitative study results.</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal">
Subjects in the qualitative study played three games in
groups, followed by a focus group discussion.<span style="mso-spacerun: yes;">
</span>Each subject played about 10 minutes of <i>Facade</i>, <i>The Restaurant Game</i>, and
<i>Skyrim</i> -- interacting with NPCs in a tavern as a control, as a reminder of the
current state-of-the-art in industry.<span style="mso-spacerun: yes;">
</span>This study was risky, given that the other games are polished released
products, while <i>The Restaurant Game</i> is not really even a game -- more of a proof-of-concept
tech demo.<span style="mso-spacerun: yes;"> </span>None of the subjects had
played <i>Facade</i> before, and most were captivated by it.<span style="mso-spacerun: yes;"> </span>However, it was encouraging that the
discussions revealed that players did find <i>The Restaurant Game</i> to be more
player-directed, and responsive to nuanced language, while dramatizing a restaurant narrative in cooperation with the player.<span style="mso-spacerun: yes;"> </span>Unprompted, subjects described <i>The Restaurant
Game</i> as a sandbox.<span style="mso-spacerun: yes;"> </span>Some notable comments
included:</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<i>“It felt like The Restaurant Game was trying to play
along with the player. It just kind of roles with it.”</i></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<i>“Façade led you, The Restaurant Game lets you lead it.”</i></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<i>“What I noticed about The Restaurant is that it was
trying to do more than Façade in the sort of AI actual interpretation of
colloquialisms.”</i></div>
<div class="MsoNormal">
<br />
<i> </i> </div>
<h3 class="MsoNormal">
WHERE DO WE GO FROM HERE?</h3>
<br />
<div class="MsoNormal">
My thesis has only scratched the surface of what's possible
with crowdsourced data-driven interaction, and I see it as a starting point
rather than an end.<span style="mso-spacerun: yes;"> </span>As a former game
programmer, the fact that these characters can still surprise me by saying
things I've never seen them say before, even after working with this data for
years, is incredibly exciting, and hints at the possibilities for truly next
generation characters, driven by massive collections of content.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Over the course of the PhD, I dabbled in a couple related
side-projects, reusing <i>The Restaurant Game</i> platform -- I collaborated with the
Personal Robots Group on <i>Mars Escape</i>, a game to capture data about human-robot
interaction, and collaborated with the GAMBIT Game Lab on <i>Improviso</i>, which
collects data about playing roles on the set of a low-budget sci-fi movie.<span style="mso-spacerun: yes;"> </span>But there is still much, much more to
explore.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgg0k_xubBDLOjb5uSBDQWsF_GKmXMfoab_-eIDMxPYLmwEk9VIm4CNpJpS6qg5tHaaNX1LSsOIg8H8N1iPS30mUn6SsvrhN5m3hRKVKvSOduYrV2FSJav_eSZ4k5krwvmIGibh4ewMaSY/s798/others.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="210" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgg0k_xubBDLOjb5uSBDQWsF_GKmXMfoab_-eIDMxPYLmwEk9VIm4CNpJpS6qg5tHaaNX1LSsOIg8H8N1iPS30mUn6SsvrhN5m3hRKVKvSOduYrV2FSJav_eSZ4k5krwvmIGibh4ewMaSY/s400/others.jpg" width="400" /></a></div>
</div>
<div class="MsoNormal">
<br />
To that end, in case you haven't been following my recent
Facebook and Twitter spam, I'm continuing to explore data-driven simulated
role-playing through a new venture called Giant Otter Technologies.<span style="mso-spacerun: yes;"> </span>Follow our latest developments at <a href="https://twitter.com/GiantOtterTech">@GiantOtterTech</a>.</div>
<div class="MsoNormal">
<br /></div>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com0tag:blogger.com,1999:blog-8221447444699368366.post-78981120834411003312013-02-11T15:15:00.000-05:002013-05-12T23:01:57.993-04:00PhD? PhDone.I defended my PhD in December, and got the document all signed off in January. I will eventually write a "reader's digest" summary post (with figures and videos). But for now, for those interested, the document is online here:<br />
<br />
Collective Artificial Intelligence: Simulated Role-Playing from Crowdsourced Data<br />
<a href="http://web.media.mit.edu/~jorkin/papers/orkin_phd_thesis_2013.pdf">http://web.media.mit.edu/~jorkin/papers/orkin_phd_thesis_2013.pdf</a><br />
<br />
You can read the Abstract here:<br />
<a href="http://web.media.mit.edu/~jorkin/research/jeff_orkin_phd_abstract.html">http://web.media.mit.edu/~jorkin/research/jeff_orkin_phd_abstract.html</a><br />
<br />
<br />Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com1tag:blogger.com,1999:blog-8221447444699368366.post-48355551732644201472012-02-06T16:52:00.004-05:002013-03-20T22:16:22.819-04:00Game Development in the 21st Century: Meet my A.I. Team<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLgLMUwcAyuR1DbBF9Ilhr8kYhiVPbNTS42kEaqzSe7JxjxkIHDnvFYYSOqKvcSGkhuyXdvLXGoFBN41AUxCe8m-9tL91Pl97p6Wlv4vkgrwdntSi8cSXjz-SIRrqXrRO0Wj3PEgbBgUk/s1600/team_thumbs.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLgLMUwcAyuR1DbBF9Ilhr8kYhiVPbNTS42kEaqzSe7JxjxkIHDnvFYYSOqKvcSGkhuyXdvLXGoFBN41AUxCe8m-9tL91Pl97p6Wlv4vkgrwdntSi8cSXjz-SIRrqXrRO0Wj3PEgbBgUk/s1600/team_thumbs.jpg" /></a></div>
<div style="font-family: inherit; text-align: justify;">
<span style="font-size: small;">Over the past couple months, I've been working with a team of seven people to create thousands of variations of restaurant behavior and dialogue, drawing from our database of recorded games. (See my <a href="http://collectiveai.blogspot.com/2011/12/thesis-proposal-recap.html">previous post</a> for some background on the project). The interesting thing is that my team members don't know anything about A.I., they're not programmers, they don't have any previous game development experience. They are random people that I hired on the internet, with minimal vetting, and they're doing great work! </span></div>
<div style="font-family: inherit;">
<span style="font-size: small;"><br />
</span></div>
<div style="font-family: inherit;">
<span style="font-size: small;">More after the jump...</span></div>
<br />
<span style="font-family: "Times New Roman","serif"; font-size: 12pt;"></span><br />
<a name='more'></a><div style="font-family: inherit;">
<span style="font-size: large;"><b>Follow research updates on Twitter: <a href="https://twitter.com/#%21/jorkin">@jorkin</a></b></span></div>
<div style="font-family: inherit;">
<br /></div>
<div style="font-family: inherit;">
<span style="font-size: small;">Let me introduce my team:</span></div>
<div style="font-family: inherit;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFaiucYyuIadh16sXzRD3apbQIlgyy-XDNMSNYIBv0CJ4ySkJuQk6VAy9FnSuwXYXX_3qzw8r0GXzzsVG3N-CHTPtwvNyz5gX2rQFtLzgkLWYUcYcX5oIgcsNa-RaMmLYvCfXGpchfmfs/s1600/team.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="325" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFaiucYyuIadh16sXzRD3apbQIlgyy-XDNMSNYIBv0CJ4ySkJuQk6VAy9FnSuwXYXX_3qzw8r0GXzzsVG3N-CHTPtwvNyz5gX2rQFtLzgkLWYUcYcX5oIgcsNa-RaMmLYvCfXGpchfmfs/s400/team.jpg" width="400" /></a></div>
<div class="MsoNormal" style="text-align: justify;">
<br />
The team is responsible for annotating game logs with four types of meta-data (events, event hierarchies, causal chains, and references), which they accomplish via custom browser-based Flash applications. This meta-data becomes the fuel that powers my new planning system to control interactive character behavior and dialogue. A programmer is still required, to implement critics -- small pieces of code which constrain when fragments of behavior can execute, but annotating meta-data composes the lion's share of the authoring effort.</div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
I hired my team by posting a Data Entry job opening on <a href="https://www.odesk.com/">oDesk</a>. I asked applicants to annotate one sample file, and hired the first group of people who did a good job. My team was staffed within hours of posting the job, and has now completed annotating 1,000 game logs. It took them a total of 415 hours, which cost just under $3,000. They were working part-time, spread over a couple months, but if someone was doing this full-time (8 hours / day), 415 hours is about 52 days. So, divided among a team of seven, this work could have been completed in about a week (or a week and a half, assuming 40 hour work weeks). I still have a lot of work to do over the next few months to demonstrate that this approach results in more engaging, robust behavior, but the prospect is exciting of a practical, fast, affordable way to create characters capable of rich social interaction.</div>
<div class="MsoNormal">
</div>
<br />
<iframe allowfullscreen="" frameborder="0" height="300" mozallowfullscreen="" src="http://player.vimeo.com/video/34004811" webkitallowfullscreen="" width="400"></iframe><br />
Watch on Vimeo: <a href="http://vimeo.com/34004811">Example of Event Annotation</a>.<br />
<br />
<br />
<div class="MsoNormal">
<b><span style="font-size: large;">Small Teams, Big A.I.</span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEbaBUAiNda-tYSDISS8Xncg7dEeqU-0LjNUlj1IdozFsgPtJqX1BIMsuFQKr1XHXgzVHsCTPCXxNaN9O-GsvDQZ2TFIX3cXLgDMhnmd8rycT3tL3VdyBIrVhSHxABzJg7LC1bWtIu45Q/s1600/angry_birds.jpg" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEbaBUAiNda-tYSDISS8Xncg7dEeqU-0LjNUlj1IdozFsgPtJqX1BIMsuFQKr1XHXgzVHsCTPCXxNaN9O-GsvDQZ2TFIX3cXLgDMhnmd8rycT3tL3VdyBIrVhSHxABzJg7LC1bWtIu45Q/s200/angry_birds.jpg" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Gratuitous Angry Birds image.</td></tr>
</tbody></table>
The game industry is going through a renaissance period right now, where the trend toward bigger teams and bigger budgets is reversing, and tiny “Indie” teams of are creating big hits on mobile and online platforms. We’re seeing a return to the days when a few people, or even one person, can create an entire game, but the A.I. in these games tends to lack depth. Understandable, given that even the AAA development studios struggle to find enough A.I. programmers, and current approaches to A.I. are technical and time-consuming processes. Indie developers can (and often do) outsource creation of their art assets. In my research, I’m trying to refactor the development process such that A.I. behavior can be outsourced as easily as the art, in a reasonable amount of time for a reasonable cost, allowing a tiny team to populate their game worlds with dynamic, socially intelligent, conversational characters. Ultimately, the point of this work is not really to cut costs and create dirt cheap A.I., but rather to bring more (and a wider variety of) people into the development process, to collectively produce something that could not be achieved another way.</div>
<div style="text-align: justify;">
</div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
</div>
<div class="MsoNormal" style="text-align: justify;">
Regarding A.I. in indie games, there are some notable exceptions on the horizon -- games from small teams with deep A.I. Industry veteran Paul Tozour’s <a href="http://www.kickstarter.com/projects/paultozour/city-conquest">City Conquest</a> is a tower defense RTS developed using genetic algorithms to balance the playing experience. <a href="http://games.soe.ucsc.edu/project/prom-week">Prom Week</a>, developed by a team of PhDs at UCSC (studying with Michael Mateas and Noah Wardrip-Fruin), might be considered the spiritual successor to <a href="http://www.interactivestory.net/">Facade</a>. Prom Week promises a highly replayable gameplay experience, based on dynamic social interaction, but the project (perhaps wisely) abandons Facade’s natural language interface. I think that natural language input still offers an opportunity to give players an increased sense of autonomy, and am hoping to show that leveraging data recorded from thousands of players can support robust language understanding while preserving the player’s sense of agency.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><span style="font-size: large;">What about Turk?</span></b></div>
<b><span style="font-size: large;"> </span></b><br />
<div class="MsoNormal" style="text-align: justify;">
When I describe my approach as crowdsourcing, people often ask why I’m not using <a href="https://www.mturk.com/mturk/welcome">Amazon’s Mechanical Turk</a>. Crowdsourcing purists might say that what I’m doing on oDesk is really outsourcing more than crowdsourcing, because I’m working with a persistent team (although the earlier phase of my project where we recorded players online was certainly crowdsourcing). I did experiment briefly with Turk, and my impression was that there are <i><b>lots</b></i> of scammers on Turk, trying to make money by clicking things as fast as possible, and a large part of the effort would need to focus on validating work. My research focus is really on building the system that generates behavior and dialogue from the annotated data, and crowdsourcing is a means to an end. There is more personal interaction on oDesk, and the reputation system provides an incentive to maintain high quality work, making it easier to find good people and continue working with them. My experience on oDesk could be considered a proof of concept for a process that could be repeated on Turk in the future.</div>
<div class="MsoNormal">
<br /></div>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com2tag:blogger.com,1999:blog-8221447444699368366.post-84714982210412925732011-12-20T23:31:00.017-05:002013-06-03T15:18:48.404-04:00Thesis Proposal Recap<div class="MsoNormal" style="text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0G2iE65BJHfp_TiRplj9ObK3qCZLIiYm3N5nvsaG5SNdAvLqu6rePyaCx_eFNsx929hjq9u8wmyWKx-myrI7-JJ3XCNWSz2RI0YhBKaVCUMWdM_XY4VitHnaEo9rLSXCq4V5fpxrJ00o/s1600/architecture.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="160" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0G2iE65BJHfp_TiRplj9ObK3qCZLIiYm3N5nvsaG5SNdAvLqu6rePyaCx_eFNsx929hjq9u8wmyWKx-myrI7-JJ3XCNWSz2RI0YhBKaVCUMWdM_XY4VitHnaEo9rLSXCq4V5fpxrJ00o/s200/architecture.gif" width="200" /></a>I'm long overdue for a new blog post. 2011 has been a busy year. I've been making a lot of research progress, which has kept me too busy to blog! I've hit a number of milestones in recent months -- my thesis proposal was approved in August, a prototype of my new planner came together in October, and I launched a crowdsourced data annotation effort in November. This post will briefly recap the proposal that I presented to my committee. </div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
I'm building a new planner that combines plan recognition with case-based reasoning, to simulate reasoning from a collection of episodic memories. My approach is production-oriented, in that it is mindful of the content authoring bottleneck (where content is AI behaviors and dialogue lines), and attempts to make it as easy as possible to get an enormous variety of content into the system. The approach leverages the realities of the world we live in today, where it is possible to crowdsource repetitive tasks to non-experts, and opts for powering the system with lots of mundane data annotation rather than cleverly programmed AI.<br />
<br />
For background information on this project, see the previous recaps: <a href="http://collectiveai.blogspot.com/2011/01/first-post-w00t.html">Part 1</a>, <a href="http://collectiveai.blogspot.com/2011/02/recap-part-2-dialogues-with-data.html">Part 2</a>, <a href="http://collectiveai.blogspot.com/2011/02/recap-part-3-generation-from-collective.html">Part 3</a>. <br />
<br />
More after the jump...</div>
<div class="MsoNormal">
<br />
<a name='more'></a><br />
<span style="font-size: large;"><b>Follow research updates on Twitter: <a href="https://twitter.com/#%21/jorkin">@jorkin</a></b></span><br />
<br /></div>
<div class="MsoNormal">
<span style="font-size: large;"><b>INTRODUCTION</b></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
AI for combat has come a long way in the past 15 years, or so. How we can make other parts of the experience -- the social interaction, and storytelling -- as dynamic as the combat? While numerous games deliver sandbox-style combat that adapts to player choices, social interaction and storytelling is generally either entirely linear, or scripted with limited opportunities for interaction or influence. Game developers have mastered pathfinding, animation, scripting, and reactive behavior, but have made less progress producing characters that can dynamically communicate, cooperate, and maintain coherent interactions over long periods of time.</div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
What about <i>The Sims</i> and <i>Facade</i>? These are exceptions that inspire my work, but lie at opposite ends of a spectrum, where I am targeting something in between. <i>The Sims</i> is entirely emergent; a doll house that does not attempt to tell any particular narrative. <i>Facade</i> tells a specific, coherent narrative, but has been criticized for limiting player agency. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: large;"><b>Graphics Envy</b></span></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi59-qipnXXBH96E-c1nXq3B_lX4hYlvaA8wM7SUQFEeC_8DTVKBejf_f6TKRdGgH6w_Fl3IQoU6dB0e_V95jleGeQQ8mLGhb-7s5XwJZxgF_XxU-zZWK6eNzWFuKTQcZXETWXdas6YpIY/s1600/mario_small.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi59-qipnXXBH96E-c1nXq3B_lX4hYlvaA8wM7SUQFEeC_8DTVKBejf_f6TKRdGgH6w_Fl3IQoU6dB0e_V95jleGeQQ8mLGhb-7s5XwJZxgF_XxU-zZWK6eNzWFuKTQcZXETWXdas6YpIY/s1600/mario_small.gif" /></a></div>
<div class="MsoNormal" style="text-align: justify;">
It is often noted that AI has not kept pace with graphics in games. There was a time when every pixel on the screen was plotted by hand. Today we have arrived a representation that allows us to render 3D worlds at runtime from any camera position. And this representation scales -- with more processing power, we can render more polygons (millions of polygons!), yielding incredible detail.</div>
<div class="MsoNormal" style="text-align: justify;">
</div>
<div style="text-align: justify;">
</div>
<div class="MsoNormal" style="text-align: justify;">
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWtQbfsVDpJjzNaZf3LK1aX5lw7yGwWNYgcglndTFWp-PDY7ZTTwGoE7DyClQp83st6F2Fh30qE5lFkxKHz_5hyphenhyphen5QPd7e4L-FrRUZ9ssWVN6bsZyyn8mjPbcwTf7R76_18jAY9W2vP_x0/s1600/complex_graph_big.gif" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWtQbfsVDpJjzNaZf3LK1aX5lw7yGwWNYgcglndTFWp-PDY7ZTTwGoE7DyClQp83st6F2Fh30qE5lFkxKHz_5hyphenhyphen5QPd7e4L-FrRUZ9ssWVN6bsZyyn8mjPbcwTf7R76_18jAY9W2vP_x0/s320/complex_graph_big.gif" width="244" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Graph generated by recording 5,000 pairs of players.</td></tr>
</tbody></table>
We are essentially still in the pixel-plotting days of AI, where we are crafting every decision by hand -- an approach that does not scale beyond the complexity of behaviors that we see in current games. This graph (to the left) was generated by recording 5,000 pairs of humans playing as customers and waitresses in a virtual restaurant. The graph shows all of the action sequences observed from the start of the game (at the top), until the end of the game (at the bottom). Human behavior is complex, and nuanced. We will never be able to author such a dense possibility space by hand. We need a representation that can be recombined to generate all of these possibilities at runtime, adapting to player choices.</div>
<div class="MsoNormal" style="text-align: justify;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwCv1VhWGcPb5f9f1ZrX3rFnBItCx1aUWiAX4FSV7ocy5RMTQzSgtpr1hmApTH0i2N0ofHnt88BVrBvlzZZUf96N-my2nuH-_xgHYHrAkRGLM_ov5-T0KId9sA9cPUQOFZN4_rwpKjJrU/s1600/complex_graph.gif" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a></div>
</div>
<div class="MsoNormal">
<br />
<span style="font-size: large;"><b>COLLECTIVE ARTIFICIAL INTELLIGENCE</b></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I refer to my approach as Collective Artificial Intelligence -- a combination of crowdsourcing, pattern-mining, and episodic planning.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: large;"><b>Crowdsourcing</b></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
<div class="separator" style="clear: both;">
There is no easier way to author behavior than through a live demonstration (i.e. playing a character in a game). Anyone can do it, without any technical know-how. I have collected three data sets of behavior: <a href="http://therestaurantgame.net/"><i>The Restaurant Game</i></a> has recorded over 10,000 demonstrations, <a href="http://gambit.mit.edu/loadgame/improviso.php"><i>Improviso</i></a> is currently recording actors on a virtual film set, and <a href="http://robotic.media.mit.edu/MarsEscapeGame/Mars_Escape.html"><i>Mars Escape</i></a> was a collaboration with the Personal Robots Group at the Media Lab, which recorded hundreds of demonstrations of a human-robot collaboration. All of these games were created from the same codebase.</div>
<div class="separator" style="clear: both;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFJSbfraRC_7sGHHCcAsragoGTyZUjo30B8tbWizRbuJHptzRIoJXHV4a7r2a2iN3E0q1dV8wxA3MDL_vFTSDk2xw0M1W6cuibmIDPp0bpLdfrytWYTRPvipzFiB1EDy_5OHW2uwzpGlI/s1600/montage.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="97" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgFJSbfraRC_7sGHHCcAsragoGTyZUjo30B8tbWizRbuJHptzRIoJXHV4a7r2a2iN3E0q1dV8wxA3MDL_vFTSDk2xw0M1W6cuibmIDPp0bpLdfrytWYTRPvipzFiB1EDy_5OHW2uwzpGlI/s400/montage.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The Restaurant Game, Improviso, and Mars Escape.</td></tr>
</tbody></table>
<div class="separator" style="clear: both;">
Recording thousands of people in the same scenario can be thought of as crowdsourcing our imagination, which is the first step in my process. In the next step -- pattern-mining -- crowd-sourcing is employed once again, to help <i>interpret </i>the data collected from the first step.</div>
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: large;"><b>Pattern-Mining</b></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
Pattern mining is a human-machine collaborative process. Humans are recruited to annotate data with browser-based tools, explaining the meaning of different action sequences to the AI system. Including humans in the loop enables capturing sparse examples of behavior and dialogue that would have slipped through the cracks of statistical machine learning algorithms, and provides an opportunity for a designer to control which behaviors to capture and which to ignore. One of the goals of this work is to dymystify Game AI from being a black art, by refactoring a difficult problem into many simple annotation tasks. The intuition is that few people can program behavior, but anyone who speaks English can explain behavior given a transcript of a recorded game. </div>
<br />
<div class="MsoNormal" style="text-align: justify;">
Each recorded transcript is annotated with four layers of meta-data: event sequences, event hierarchies, causal chains, and references. Embedding streamlined annotation tools in a browser makes it possible to take advantage of the numerous web sites that exist for hiring people online to perform unskilled labor (e.g. <a href="https://www.mturk.com/mturk/welcome">Amazon Mechanical Turk</a>, <a href="http://crowdflower.com/">CrowdFlower</a>, <a href="https://www.elance.com/">eLance</a>, <a href="https://www.odesk.com/">oDesk</a>). Below is a video demonstrating the process of annotating event sequences. More information about annotation is available in papers from <a href="http://www.media.mit.edu/cogmac/publications/fdg_2010_camera_ready.pdf">INT3</a> and <a href="http://www.media.mit.edu/cogmac/publications/fdg_2010_camera_ready.pdf">AIIDE</a>.</div>
<br />
<div class="MsoNormal" style="text-align: justify;">
<br />
<iframe allowfullscreen="" frameborder="0" height="300" mozallowfullscreen="" src="http://player.vimeo.com/video/34004811" webkitallowfullscreen="" width="400"></iframe><br />
Watch on Vimeo: <a href="http://vimeo.com/34004811">Event Annotation</a>.<br />
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="font-size: large;"><b>Episodic Planning</b></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
Annotated data is the fuel that powers the new episodic planning system. The diagram below illustrates the machinery inside the mind of a character. In brief: the agent receives observations through sensors, and records the observed interaction history on the Blackboard. In order to understand and respond to observations, the agent can exploit Collective Memory -- a database of recorded transcripts, and associated meta-data. The Plan Recognizer leverages the learned Event Dictionary to infer an event hierarchy from the observed interaction history. The Action Selector then searches for recorded human games that match the inferred event hierarchy, and passes these as proposals to a set of Critic processes. Critics draw on a variety of sources to scrutinize the validity and coherence of following the next step in each proposed plan. If one of the proposals is approved by all critics, it is passed to the Actuator for execution.</div>
<div class="MsoNormal" style="text-align: justify;">
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0G2iE65BJHfp_TiRplj9ObK3qCZLIiYm3N5nvsaG5SNdAvLqu6rePyaCx_eFNsx929hjq9u8wmyWKx-myrI7-JJ3XCNWSz2RI0YhBKaVCUMWdM_XY4VitHnaEo9rLSXCq4V5fpxrJ00o/s1600/architecture.gif" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0G2iE65BJHfp_TiRplj9ObK3qCZLIiYm3N5nvsaG5SNdAvLqu6rePyaCx_eFNsx929hjq9u8wmyWKx-myrI7-JJ3XCNWSz2RI0YhBKaVCUMWdM_XY4VitHnaEo9rLSXCq4V5fpxrJ00o/s400/architecture.gif" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Agent architecture for episodic planning.</td></tr>
</tbody></table>
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
Below is a video demonstrating the system. The bottom of the video illustrates the plan recognition process that is running in the Customer's mind as new observations arrive. While it would be easy to script something like this, keep in mind that the video is unscripted -- the two characters are dynamically responding to one another based on observed actions and natural language dialogue text. Each time the system runs, the scenario plays out differently. I am working toward a demo where a human can play one of the characters, but this requires much more annotated data to cover the space of possible behavior. The big difference between this system and my earlier <a href="http://collectiveai.blogspot.com/2011/02/recap-part-3-generation-from-collective.html">statistical approach</a> is that the new system always produces a coherent narrative, or if it doesn't, it's a bug that can be fixed rather than just some statistical anomaly. </div>
<br />
<iframe allowfullscreen="" frameborder="0" height="450" mozallowfullscreen="" src="http://player.vimeo.com/video/34004738" webkitallowfullscreen="" width="400"></iframe><br />
Watch on Vimeo: <a href="http://vimeo.com/34004738">Episodic Planner: first prototype</a>.<br />
<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><span style="font-size: large;">EVALUATION</span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
My committee approved my proposal, giving me an "uncondition pass"... with a condition. The condition is that I write a concise, focused, one-page plan for how I will evaluate the system. I am still thinking about this. I hope to quantitatively show that the system produces human-like interactions, when compared to thousands of human-human transcripts. But more importantly, I want to show qualitatively that this system produces a new experience; one that players find more engaging due to an increased sense of agency. I want to demonstrate that players feel that the AI character is cooperating, and helping the human player take the narrative in the direction the human chooses to go.<br />
<br />
Now that my proposal has been approved, I am required to defend my thesis within 12 months of approval, so allegedly I will be finished by sometime in August 2012 at the latest! </div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com2tag:blogger.com,1999:blog-8221447444699368366.post-55908857526811073172011-04-21T20:44:00.006-04:002013-05-12T23:00:57.822-04:00Improviso: Month One Highlights, and now on Mac OSX!<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicfrAFb95q8O4GyzyS4fgvMUlaPv9dJv53rEfOcgR1RyTtt6K7qI1uPoW6aluH9RiRuYTMGi2ziXcXTnRGE18RfZGQOD75w_LPeIjzc2y5Iku6zd4X-5_eynX8j1XPf7YfUcVNpsbW4aU/s1600/improviso1.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEicfrAFb95q8O4GyzyS4fgvMUlaPv9dJv53rEfOcgR1RyTtt6K7qI1uPoW6aluH9RiRuYTMGi2ziXcXTnRGE18RfZGQOD75w_LPeIjzc2y5Iku6zd4X-5_eynX8j1XPf7YfUcVNpsbW4aU/s320/improviso1.jpg" width="320" /></a></div>
Improviso has been online for about one month, so I thought I would share some highlights so far.<br />
<br />
The big news is that we released the Mac OSX version today. If you are still holding out for the iOS version, DON'T! There is not going to be an iOS version!! So download Improviso for Windows or OSX and play today: <a href="http://gambit.mit.edu/improviso">http://gambit.mit.edu/improviso</a><br />
<br />
<br />
More after the jump...<br />
<a name='more'></a><br />
<br />
We are grateful that the press has been supportive of the project, which has helped data collection get off to a good start. I would like to thank the folks who have posted these articles:<span style="font-size: small;"> </span><br />
<ul>
<li><span style="font-size: small;"><a href="http://web.mit.edu/newsoffice/2011/improv-rpg.html">Improv game puts the role-playing back in RPG</a> (MIT)</span></li>
<li><span style="font-size: small;"><span class="newsTitle" style="font-family: inherit;"><a href="http://gamasutra.com/view/news/33600/MIT_GAMBIT_Lab_Releases_Improviso_Game_With_Goal_Of_Training_AI.php">MIT GAMBIT Lab Releases Improviso Game With Goal Of Training AI</a> (Gamasutra)</span></span><span style="font-size: small;"> </span></li>
<li><span style="font-size: small;"><a href="http://www.indiegames.com/blog/2011/03/freeware_game_pick_improviso_d.html">Freeware Game Pick: Improviso</a> (IndieGames.com)</span></li>
<li><span style="font-size: small;"><a href="http://www.rockpapershotgun.com/2011/03/18/gambit-game-is-drama-school-for-ai/">GAMBIT Game is Drama School for AI</a> (Rock, Paper, Shotgun)</span></li>
<li><span style="font-size: small;"><a href="http://motherboard.tv/2011/3/21/look-out-watson-mit-s-new-videogame-is-training-better-a-i-s--2">Look Out Watson MIT's New Video Game is Training Better AIs</a> (Motherboard.tv)</span></li>
<li><a href="http://hplusmagazine.com/2011/04/05/collective-ai-a-conversation-with-mits-jeff-orkin/">Collective AI: A Conversation with MIT’s Jeff Orkin</a> (H+ Magazine)</li>
</ul>
<br />
So what do people do in this game, you ask? I promise anyone who plays will have a unique experience, to say the least. Here are a couple transcripts of real players online. Special thanks to the members of the <a href="http://www.facepunch.com/threads/1072865-Improviso">Facepunch forum thread</a> who have been sharing their screenplays. <br />
<br />
<b>Sample Game #1:</b><br />
<blockquote>
---------------------------------------------------------------------<br />
<b>Scene 1/3: "Interrogate Ted"</b><br />
---------------------------------------------------------------------<br />
<br />
TED said to AGENT SMITH: HELO<br />
TED said to AGENT SMITH: help<br />
AGENT SMITH said to TED: Shut up and listen to me!<br />
TED said to AGENT SMITH: but I'm stuck!<br />
AGENT SMITH said to TED: You were caught in this crash zone!<br />
AGENT SMITH said to TED: What are you doing in here!<br />
TED said to AGENT SMITH: I got lost, then I was attacked!<br />
AGENT SMITH said to TED: Yes<br />
AGENT SMITH said to TED: That was me!<br />
AGENT SMITH said to TED: And i tied you up!<br />
AGENT SMITH said to TED: NOW SPEAK UP!<br />
AGENT SMITH said to TED: Or else...<br />
TED said to AGENT SMITH: Did you prob me too?<br />
AGENT SMITH said to TED: I tied you down for a reason!<br />
<br />
The soundtrack has been changed to Scary.<br />
<br />
TED said to AGENT SMITH: I was looking for something interesting to report on<br />
AGENT SMITH said to TED: Now... give me a good reason why i should let you out...<br />
<br />
AGENT SMITH dropped the HANDGUN.<br />
AGENT SMITH picked up the PROBE.<br />
<br />
AGENT SMITH said to TED: Or i will use this probe on you!<br />
TED said to AGENT SMITH: I'm an ex marine and can help fight the aliens<br />
AGENT SMITH said to TED: Okay; i trust you for now. Lets go.<br />
<br />
AGENT SMITH set TED free from the rope<br />
AGENT SMITH dropped the PROBE.<br />
AGENT SMITH picked up the HANDGUN.<br />
<br />
AGENT SMITH said to TED: Pick up that probe and follow up!<br />
<br />
TED picked up the PROBE.<br />
<br />
<br />
---------------------------------------------------------------------<br />
<b>Scene 2/3: "Surprise!"</b><br />
---------------------------------------------------------------------<br />
<br />
AGENT SMITH said to TED: Now that we got out...<br />
AGENT SMITH said to TED: You can explain yourself. Better now.<br />
TED said to AGENT SMITH: Yes, yes I can, I feel much safer here<br />
DR. STEIN said to TED: Let me just... what is that bump on your forehead??<br />
<br />
DR. STEIN picked up the X-RAY.<br />
<br />
TED said to DR. STEIN: oh nothing, you don't need to check it<br />
DR. STEIN said to TED: You are wrong... let me xray this thing...<br />
<br />
The soundtrack has been changed to Exciting.<br />
DR. STEIN used the X-RAY on TED.<br />
<br />
DR. STEIN said to TED: This isnt a normal bump... OH MY GOD! SOMETHING IS MOVING IN THERE<br />
TED said to AGENT SMITH: OH NO<br />
<br />
DR. STEIN dropped the X-RAY.<br />
DR. STEIN picked up the TRANQUILIZER GUN.<br />
TED has put on the K-17 mask.<br />
<br />
TED said to DR. STEIN: YOU FOUND OUT!<br />
DR. STEIN said to AGENT SMITH: OH NO! Its an alien!<br />
AGENT SMITH said to TED: Drop dead; you skunk!<br />
TED said to DR. STEIN: MY BROTHEREN WILL BE HERE SOON<br />
<br />
AGENT SMITH attacked TED with the HANDGUN.<br />
TED died<br />
The soundtrack has been changed to Sad.<br />
<br />
AGENT SMITH said to TED: I think we did it... Did we kill it?<br />
<br />
DR. STEIN dropped the TRANQUILIZER GUN.<br />
DR. STEIN picked up the SURGICAL TOOLS.<br />
DR. STEIN used the SURGICAL TOOLS on TED.<br />
<br />
DR. STEIN said to TED: Its dead. The only proof of life outside there.<br />
AGENT SMITH said to DR. STEIN: Im sorry.<br />
<br />
<br />
---------------------------------------------------------------------<br />
<b>Scene 3/3: "R.I.P. Ted"</b><br />
---------------------------------------------------------------------<br />
<br />
AGENT SMITH said to TED: Ted. why are you here?<br />
TED said to AGENT SMITH: I cam to report a crash<br />
AGENT SMITH said to TED: I know about this crash; I knew about it all the time...<br />
TED said to AGENT SMITH: so you can help me with my report?<br />
AGENT SMITH said to TED: Sure. what do you need help with... i can provide some... insights.<br />
<br />
AGENT SMITH dropped the HANDGUN.<br />
AGENT SMITH picked up the GUN.<br />
<br />
TED said: This... this egg? what is it?<br />
<br />
The soundtrack has been changed to Tense.<br />
<br />
AGENT SMITH said to TED: Its my son...<br />
<br />
AGENT SMITH has put on the TED mask.<br />
AGENT SMITH has put on the K-17 mask.<br />
<br />
TED said to AGENT SMITH: what's happening?!<br />
AGENT SMITH said to TED: The only thing that is left to do now is to seal your hopeless fate...<br />
TED said to AGENT SMITH: NOOOOOOOO<br />
<br />
The soundtrack has been changed to Scary.<br />
The DIRECTOR started the Lightning special effect.<br />
<br />
AGENT SMITH said to TED: Prepare... to die....<br />
<br />
The DIRECTOR started the Blast-off special effect.<br />
AGENT SMITH attacked TED with the GUN.<br />
<br />
TED said to AGENT SMITH: you'll regret this!<br />
<br />
AGENT SMITH attacked TED with the GUN.<br />
TED died </blockquote>
<br />
<br />
<b>Sample Game #2:</b><br />
<blockquote>
---------------------------------------------------------------------<br />
<b>Scene 1/3: "Rescue Ted"</b><br />
---------------------------------------------------------------------<br />
<br />
AGENT SMITH said to TED: Ted Cosby?<br />
TED said to AGENT SMITH: Get me out of here!<br />
AGENT SMITH said to TED: Are you hurt?<br />
TED said to AGENT SMITH: No, but I cant move<br />
AGENT SMITH said to TED: Oh my, some kind of extra terrestrial binding mechanism<br />
AGENT SMITH said to TED: A rope perhaps<br />
AGENT SMITH said to TED: I'll need something to cut this<br />
AGENT SMITH said to TED: Any ideas?<br />
TED said to AGENT SMITH: Maybe the catfood will work!<br />
AGENT SMITH said: Ah ha, a sharp edged can<br />
AGENT SMITH said: You're as smart as they say<br />
<br />
AGENT SMITH dropped the HANDGUN.<br />
AGENT SMITH picked up the CAT FOOD.<br />
<br />
AGENT SMITH said to TED: Hmm, why would aliens have cat food<br />
<br />
AGENT SMITH set TED free from the rope<br />
<br />
TED said to AGENT SMITH: Where do we go from here?<br />
AGENT SMITH said: Let's get out of here, back to base!<br />
<br />
TED exited the scene.<br />
<br />
<br />
---------------------------------------------------------------------<br />
<b>Scene 2/3: "Doppleganger"</b><br />
---------------------------------------------------------------------<br />
<br />
K-17 said to TED: Stop right there<br />
K-17 said to TED: This man is an alien imposter!<br />
AGENT SMITH said to K-17: What?!<br />
TED said to K-17: Oh god which one of you is real?<br />
<br />
TED picked up the CAT FOOD.<br />
<br />
K-17 said to TED: Isn't this obvious?<br />
K-17 said to AGENT SMITH: How would he be on a god damned crashed spacecraft if he wasn't an alien being!<br />
AGENT SMITH said to K-17: Then why is it you showed up JUST as I rescue Mr. Cosby?<br />
TED said to K-17: Let me ask you both a question, why is there cat food on the ship?<br />
K-17 said to TED: Ted, you have to believe me<br />
K-17 said to TED: It's probably to feed their cat or something<br />
AGENT SMITH said to TED: How should I know?<br />
<br />
TED dropped the CAT FOOD.<br />
TED picked up the PROBE.<br />
<br />
K-17 said to TED: hey put that down<br />
TED said to K-17: Why?<br />
K-17 said to TED: It's... It looks fragile<br />
AGENT SMITH said to K-17: Ha! Only an imposter would say that!<br />
K-17 said to TED: I could say the same!<br />
<br />
TED dropped the PROBE.<br />
TED picked up the ALIEN EGG.<br />
<br />
K-17 said to TED: NO!<br />
<br />
TED dropped the ALIEN EGG.<br />
TED picked up the GUN.<br />
<br />
K-17 said: Put her down!<br />
TED said: It must be you!<br />
<br />
K-17 has taken off the AGENT SMITH mask.<br />
TED attacked the HANDGUN with the GUN.<br />
TED attacked K-17 with the GUN.<br />
<br />
K-17 said to TED: Argh!<br />
<br />
K-17 dropped the HANDGUN.<br />
K-17 died<br />
<br />
AGENT SMITH said to K-17: Phew<br />
AGENT SMITH said to K-17: Thanks Ted, we'd better get out of here<br />
TED said to AGENT SMITH: Alright, that was a close one.<br />
<br />
<br />
---------------------------------------------------------------------<br />
<b>Scene 3/3: "Alien Carnage"</b><br />
---------------------------------------------------------------------<br />
<br />
DR. STEIN said to TED: Sorry about this, we just have to run some routine tests<br />
AGENT SMITH said to TED: We'll have you right back out in no time at all<br />
<br />
The DIRECTOR started the Explosion special effect.<br />
<br />
AGENT SMITH said to TED: Argh!<br />
TED said to DR. STEIN: Why do I have to be strapped down?<br />
TED said to DR. STEIN: What was that!?<br />
K-17 said to AGENT SMITH: Humans!<br />
<br />
AGENT SMITH attacked the ALIEN POD with the HANDGUN.<br />
K-17 attacked AGENT SMITH with their hands.<br />
AGENT SMITH dropped the HANDGUN.<br />
AGENT SMITH died<br />
<br />
DR. STEIN said to AGENT SMITH: Smith!<br />
TED said to AGENT SMITH: Oh no!<br />
DR. STEIN said: Quick, help!<br />
<br />
DR. STEIN set TED free from the straps<br />
K-17 picked up the HANDGUN.<br />
<br />
K-17 said to TED: Stop right there human scum!<br />
<br />
TED picked up the SYRINGE.<br />
<br />
K-17 said to TED: I will vaporise you!<br />
<br />
K-17 attacked DR. STEIN with the HANDGUN.<br />
<br />
DR. STEIN said to AGENT SMITH: Argh!<br />
<br />
DR. STEIN died<br />
<br />
TED said to DR. STEIN: No!<br />
<br />
TED dropped the SYRINGE.<br />
<br />
K-17 said to DR. STEIN: Now human<br />
K-17 said to DR. STEIN: Tell me everything<br />
<br />
DR. STEIN revived<br />
DR. STEIN died<br />
TED picked up the TRANQUILIZER GUN.<br />
<br />
K-17 said to TED: No! Stop!<br />
<br />
TED used the TRANQUILIZER GUN on K-17.<br />
There is nothing selected to attack.<br />
K-17 attacked the BRIEFCASE with the HANDGUN.<br />
<br />
K-17 said to TED: This isn't the end, human!<br />
<br />
K-17 dropped the HANDGUN.<br />
K-17 died<br />
TED dropped the TRANQUILIZER GUN.<br />
TED picked up the SECRET FILES.<br />
TED used the SECRET FILES on the PEDESTAL.<br />
<br />
TED said: I will have the scoop of the year!<br />
<br />
The soundtrack has been changed to Sad.</blockquote>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com0tag:blogger.com,1999:blog-8221447444699368366.post-84204030019954278782011-03-16T17:03:00.006-04:002013-03-03T08:18:19.788-05:00Play Improviso!<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjeyZ8pmqIq4IbBXrpRpQOy3Upxdz34sFytoWFu8K7Tm_bKVbhTr_scSsjfSlfU8OtNyu9vvROklgMGYzGaK_-a_9JXszjbI_OlsLD3dX_6-9-UEyacFP4GWAwhSS5GMVejiBXM3zyTEI/s1600/screenshot_4.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjeyZ8pmqIq4IbBXrpRpQOy3Upxdz34sFytoWFu8K7Tm_bKVbhTr_scSsjfSlfU8OtNyu9vvROklgMGYzGaK_-a_9JXszjbI_OlsLD3dX_6-9-UEyacFP4GWAwhSS5GMVejiBXM3zyTEI/s320/screenshot_4.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Screenshot from Improviso.</td></tr>
</tbody></table>
PAX East just wrapped up, where we debuted our new data collection game: <a href="http://gambit.mit.edu/improviso">Improviso</a>! Players of Improviso are paired anonymously online as the Lead Actor and Director of a low budget sci-fi movie. The Lead Actor plays Ted, a struggling reporter who wakes up tied-up on an alien spaceship. The Director selects scenes, summons special effects, and plays the supporting cast members: the G-Man, Scientist, Alien, and Little Girl. Improviso is a collaboration between the Media Lab and the Singapore-MIT GAMBIT Game Lab. <a href="http://gambit.mit.edu/improviso">Download Improviso</a> yourself and play today!<br />
<br />
More after the jump...<br />
<br />
<a name='more'></a><span style="font-size: large;"><b>The best time to play Improviso, and find other players online, is between 7 - 10pm in your own time zone.</b></span><br />
<br />
We announced the game at the Dialogue as Gameplay panel (read about the panel on <a href="http://gamasutra.com/view/news/33494/PAX_East_2011_Discussing_Interactive_Drama_And_Dialogue_As_Gameplay.php">Gamasutra</a> and <a href="http://www.newsarama.com/games/pax-east-2011-dialogue-as-gameplay-110311.html">Newsarama</a>), and it was playable at the GAMBIT booth. Download the game, and get more info on the <a href="http://gambit.mit.edu/improviso">GAMBIT Improviso page</a>. The official press release is <a href="http://web.mit.edu/newsoffice/2011/improv-rpg.html">here</a>. You can find additional background materials and concept sketches on the <a href="http://gambit.mit.edu/updates/gotw/">GAMBIT blog</a>. Watch the trailer video below to get a sense of the gameplay. <br />
<br />
<iframe frameborder="0" height="300" src="http://player.vimeo.com/video/17928996" width="400"></iframe><br />
<br />
<a href="http://vimeo.com/17928996">Watch the trailer on Vimeo.</a><br />
<br />
Gameplay in Improviso is similar to <a href="http://therestaurantgame.net/">The Restaurant Game</a>, where players are given a role to play through typed text and physical interaction in a 3D environment; however rather than teaching characters how to play a role in society (e.g. a customer or waitress), we will use the data from Improviso to teach characters how to play roles in a story. The Restaurant Game had a nice tie-in to Schank’s idea of the restaurant script. Here we are hoping to show that the same methodology can automate characters from recorded gameplay in a less mundane situation, that we wouldn’t assume has a shared cultural script, yet we expect to find some recurring behaviors drawn from our shared pop-cultural knowledge of science fiction.<br />
<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCFXIFiFEutC_NE2sxT2HPcDZTvY6Llkq13kkitqdsO2HGiW04ssteUuCedi-OONNnuSxhQtdg2-SHAbFfZyggLydUOc-4SP8DSPYB5JKF0N9uA3sWp6UOWiESWgYU7DjflEU4mvaxgy0/s1600/alien_sketches.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="1" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCFXIFiFEutC_NE2sxT2HPcDZTvY6Llkq13kkitqdsO2HGiW04ssteUuCedi-OONNnuSxhQtdg2-SHAbFfZyggLydUOc-4SP8DSPYB5JKF0N9uA3sWp6UOWiESWgYU7DjflEU4mvaxgy0/s320/alien_sketches.jpg" width="202" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Concept art for the alien.</td></tr>
</tbody></table>
<br />
It was a bit surprising how challenging it was to get players to do what we hoped they would do in Improviso, relative to The Restaurant Game. When you tell someone to play as a customer or waitress in a restaurant, they know exactly what to do without requiring any more information, but if you say “you’re a G-Man in Area 51,” most people are lost. Plus, a first-person game in a science-fiction setting, with guns and aliens, brings along certain expectations; expectations that are not aligned with the gameplay in Improviso.<br />
<br />
So, the challenge with this game was figuring out how to get ordinary players, who might not have any experience with acting or writing, to engage in dramatic improv with strangers online. Below is the list of things we did, to finally get the results we wanted. <br />
<br />
<ul>
<li>created a detailed tutorial.</li>
<li>provided examples of expected play (trailer video, and story boards in the tutorial).</li>
<li>began the game with the players in a situation that forces face-to-face interaction and dialogue (Ted is tied-up, and can’t even move until Agent Smith unties him).</li>
<li>provided very specific descriptions of what should happen in each scene.</li>
<li>confined players to specific regions of the set for each scene (e.g inside the spaceship, inside the Area 51 lab).</li>
<li>started scene one with only two characters, and gradually introduce additional characters in scenes two and three.</li>
<li>implemented mechanisms to enforce relatively short scenes, such as running out of film, based on a combination of time, the number of lines exchanged, and some randomness.</li>
</ul>
Much of this seems obvious in retrospect. Hindsight is 20/20. But in any case, it worked! We’ve observed almost 100 people engage in online improv. Many are getting into character, and even seem to be having fun. We’re excited to see what happens in the wilds of the internet.<br />
<br />
This project would not have been possible without the support of GAMBIT. Improviso began as an 8-week summer project with a team of nine undergrads (seven from Singapore, plus one from both MIT and RISD). A GAMBIT grant allowed us to continue working on it for about three more months, with a smaller team of two programmers and one artist.Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com1tag:blogger.com,1999:blog-8221447444699368366.post-57563961046249317152011-02-25T23:24:00.024-05:002013-05-12T23:01:28.858-04:00Recap Part 3: Generation from Collective Gameplay<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2t4gp_kV-nonXf9JP4VRy5bJvg85pfvGEdnoU_tCCRXS29HGe8kPCcgH2pbbwX4JRZNXn3ipYdgnNMQumHRjcR-rm9nNTkVuDpbfYVf5a8ptOnbFy7leTnjE86RhyphenhyphenQnM24qaq0DAOGh4/s1600/fortune.gif" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="127" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2t4gp_kV-nonXf9JP4VRy5bJvg85pfvGEdnoU_tCCRXS29HGe8kPCcgH2pbbwX4JRZNXn3ipYdgnNMQumHRjcR-rm9nNTkVuDpbfYVf5a8ptOnbFy7leTnjE86RhyphenhyphenQnM24qaq0DAOGh4/s320/fortune.gif" width="320" /></a></div>
<div class="MsoNormal">
In 2008, I implemented my first version of a system that plans character behavior and dialogue from recorded gameplay data. This system combined the components described in <a href="http://collectiveai.blogspot.com/2011/01/first-post-w00t.html">Recap 1</a> and <a href="http://collectiveai.blogspot.com/2011/02/recap-part-2-dialogues-with-data.html">Recap 2</a> with a replay system, which automated a character by playing back fragments of log files (or one log in its entirety).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The planner is a bit like the Game AI equivalent of the <a href="http://pdos.csail.mit.edu/scigen/">Random Paper Generator</a>. It essentially randomly wanders between 5,000 log files, stitching together fragments of behavior and dialogue at run-time, while maintaining local coherence by critiquing log transitions for statistical likelihood. This random stitching process is complicated by the fact that each character (customer and waitress) needs to observe the behavior of the other character, and select a sensible response from a log file.<br />
<br />
More after the jump... </div>
<div class="MsoNormal">
<br />
<a name='more'></a><br /></div>
<div class="MsoNormal">
At a high level, here is what happens: each character selects an arbitrary log file, begins a replay, and continually critiques him/herself for the likelihood of executing the next action in the log (based on the n-gram model described in <a href="http://collectiveai.blogspot.com/2011/01/first-post-w00t.html">Recap 1</a>). When the log indicates that other character does something, the character waits to observe the expected action. If the next observation does not match the expectation, the character looks for a new log file that is a better match for recent observations. While replaying physical actions, the next action might be a dialogue line, at which point the character toggles to the chat system (described in <a href="http://collectiveai.blogspot.com/2011/02/recap-part-2-dialogues-with-data.html">Recap 2</a>). The dialogue interaction may eventually terminate with a physical action, leading the characters to toggle back to the physical replay system, and so on.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI0aLfh6Frs8ohrOJz_X16GjR2Wve8zTEq1LkuZzxgV24bZsgV5P1bl4xZmuFG8Y_zMCsXI6DQl6iXGX0xESyM_hliAUTLf87mFD6_p5jaoMkS5JPixJrM2J4RcJWaAEwbfSMor77DfyQ/s1600/nolf.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI0aLfh6Frs8ohrOJz_X16GjR2Wve8zTEq1LkuZzxgV24bZsgV5P1bl4xZmuFG8Y_zMCsXI6DQl6iXGX0xESyM_hliAUTLf87mFD6_p5jaoMkS5JPixJrM2J4RcJWaAEwbfSMor77DfyQ/s200/nolf.jpg" width="154" /></a>I presented <a href="http://www.media.mit.edu/cogmac/publications/orkin_aamas2009.pdf">a paper</a> at AAMAS 2009 in Budapest covering the details of this system. I had the good fortune of having this paper summarized by Michael Mateas at the GDC 2010 Game Studies Download 5.0 panel. Michael describes the work more clearly than I can describe it myself – <a href="http://web.media.mit.edu/%7Ejorkin/blog/Mateas_GameStudiesDownload5.0_GDC2010.mp3">see for yourself</a>.<br />
<br />
Incidentally, while wandering around Budapest, I found Cate Archer painted on a wall outside a bar. Who knew the Hungarians were such big fans of <a href="http://en.wikipedia.org/wiki/The_Operative:_No_One_Lives_Forever">NOLF</a>?</div>
<div class="MsoNormal">
<br />
<br /></div>
<div class="MsoNormal">
<b>Planner Demos</b></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
Below are some videos demonstrating the planner in action. </div>
<div class="MsoNormal">
The videos are more interesting if you keep a couple things in mind:</div>
<ul>
<li>All of the characters’ behavior and dialogue comes directly out of recorded human-human games (typos and all).</li>
<li>The first two videos show two AI characters interacting. However, there is no centralized control. Each character has a separate pool of log files to draw from, and they are responding to one another dynamically in real-time based on observed actions, state changes, and raw chat text. Their AI processes are running on an AI server, and each character could be running on an entirely different machine.</li>
</ul>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This is a video of one of the better runs:</div>
<div class="MsoNormal">
<br />
<iframe frameborder="0" height="300" src="http://player.vimeo.com/video/20394758" width="400"></iframe><br />
<a href="http://vimeo.com/20394758">AAMAS 2009 Clip 1</a> on Vimeo.<br />
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Here is an average run:</div>
<div class="MsoNormal">
<br />
<iframe frameborder="0" height="300" src="http://player.vimeo.com/video/20394784" width="400"></iframe><br />
<a href="http://vimeo.com/20394784">AAMAS 2009 Clip 2</a> on Vimeo.<br />
<br />
<br /></div>
<div class="MsoNormal">
Then, there were also many runs where the characters go to the bar get stuck in an infinite loop of ordering beer. You don’t need to see those.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The characters can interact with other characters, or with humans. Here is a video of an AI waitress interacting with a human customer. It works fairly well as long the human behaves very cooperatively. You can see that the system does not yet have any implementation of long-term memory. Towards the end of the video, there are some examples of the AI responding to… less ordinary interactions:</div>
<div class="MsoNormal">
<br />
<iframe frameborder="0" height="300" src="http://player.vimeo.com/video/20394794" width="400"></iframe><br />
<a href="http://vimeo.com/20394794">AAMAS 2009 Clip 3</a> on Vimeo.<br />
<br />
<br />
<b>The Good, the Bad, and the Ugly</b><br />
<b><br />
</b></div>
<div class="MsoNormal">
As a game developer, having spent a decade programming behavior by hand, it was very exciting to see that these characters could do anything remotely human-like without any hand-programming at all. But there are obviously some serious drawbacks to this system – primarily the complete lack of designer control, and the related fact that characters do not always do the right thing. Below are some cool things about this system, and some not-so-cool things.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Cool Things:</div>
<ul>
<li>High-level behavior (dialogue and decomposable actions) runs on an AI server networked with game (via sockets). Low-level behavior (pathfinding, animations, locomotion) is implemented in a layer integrated with the game engine. For example, the AI server sends a command to the waitress to pickup a steak from the kitchen, and the low-level game-side layer navigates to the kitchen, resolves the reference to a specific steak in the game world, and selects the animation to pick it up.</li>
<li>Characters can be running on different servers. They only communicate within the game world by observing one another’s actions, state changes, and raw chat text. Characters observe and respond to humans through the same machinery used to respond to other AI characters</li>
<li>No hand-authoring of high-level behaviors.</li>
<li>Interactions play out differently every time.</li>
<li>The system handles both physical interaction and natural language dialogue.</li>
</ul>
<div class="MsoNormal">
<br />
Not-So-Cool Things:</div>
<ul>
<li>No designer input. No authorial control.</li>
<li>Sometimes characters do wrong, weird, or out-of-context things.</li>
<li>Behavior is guided by statistical regularities, but sometimes the most interesting behaviors are statistical outliers. These outliers get filtered out, for better or for worse. </li>
<li>Constraining behavior with a trigram model ensures local coherence, but not global coherence -- characters tend to do sensible things from moment-to-moment, but over longer periods of time they can get caught in cycles. </li>
</ul>
<div class="MsoNormal">
<br />
<br />
<b>Smart Cookie</b><br />
<br />
So, implementing this system was an informative foray into statistical modeling of language and behavior, but soon after publishing this work, I received some poignant words of wisdom from a fortune cookie: "Don’t let statistics do a number on you." Statistics are useful for mining recurring patterns in large datasets of gameplay data, but statistics alone cannot replace game designers -- ultimately there needs to be a human in the loop to identity behaviors of interest and to make sense of the higher-level structure of the scenario. Noah Wardrip-Fruin’s <a href="http://www.amazon.com/gp/product/0262013436?ie=UTF8&tag=jefork-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0262013436">Expressive Processing</a> has a chapter about statistical AI that highlights <a href="http://web.media.mit.edu/%7Ejorkin/restaurant/">The Restaurant Game</a>, and comes to this same conclusion. My current research aims to leverage statistics while keeping a human designer in the loop, forming a powerful human-machine collaborative authoring process.<br />
<br />
That wraps up the recaps. The rest of 2009, after AAMAS, was kind of the dark period of my PhD, in terms of productivity. I spent a lot of time working on a dialogue act classifier (described in <a href="http://www.media.mit.edu/cogmac/publications/Orkin_AGS_aamas2010.pdf">this paper</a>) that I ultimately have not ended up using in my new system. And then over six months was devoted to intense reading for my <a href="http://web.media.mit.edu/%7Ejorkin/generals/general_exams.html">General Exams</a>. Future posts will begin describing components of my new system, which is still in development, and aims to address shortcomings of the first version of the planner, with the concession that is no longer a fully automated authoring system (but I think that is for the best, really).</div>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com0tag:blogger.com,1999:blog-8221447444699368366.post-30922417525273614652011-02-02T19:10:00.004-05:002013-06-28T13:10:23.574-04:00Recap Part 2: Dialogues with Data<div class="MsoNormal">
Welcome back readers. Another snow-day in Boston; good time to update the blog! In this <b>interactive </b>post I will continue <a href="http://collectiveai.blogspot.com/2011/01/first-post-w00t.html">my recap</a> of early experiments in automating characters with recorded gameplay data. You will find a demo to play with at the end.</div>
<br />
<div class="MsoNormal">
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivWZerQrwtH2yIGJwFZ3JripgQR32fuWdj8G8DKgvYmOo0lKKR2X2n6fRNSwpfo6d-ARnky1dWRVHWtgnAjVMyn6yLAznh46gB3CYUUUrIbnNGAXZJKPetr6CRJLsUoGqPalwD8e3BKn0/s1600/chat.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="238" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivWZerQrwtH2yIGJwFZ3JripgQR32fuWdj8G8DKgvYmOo0lKKR2X2n6fRNSwpfo6d-ARnky1dWRVHWtgnAjVMyn6yLAznh46gB3CYUUUrIbnNGAXZJKPetr6CRJLsUoGqPalwD8e3BKn0/s320/chat.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Human waitress chats with an AI customer, trained with 5,000 games.</td><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
After completing the Masters, I was anxious to get something truly interactive up and running as quickly as possible. I thought I would start by looking at the chat text data, and I was inspired by reading a <a href="http://portal.acm.org/citation.cfm?id=276531">paper</a> about the <i>How May I Help You</i> system at AT&T, an unsupervised system that learned correlations between salient phrase fragments and call routing decisions. I wondered if we could take a similar approach to learning patterns within chat text exchanges based on salient word sequences for different contexts within the restaurant scenario – e.g. getting seated, ordering, paying the bill, etc – and use them to automate a chat bot.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
More after the jump….</div>
<div class="MsoNormal">
<br />
<a name='more'></a><br /></div>
<div class="MsoNormal">
The implementation of the chat bot was brain-dead simple, yet results were often surprisingly effective. I started by segmenting the log files into conversations, where a conversation is one or more uninterrupted lines of chat text sandwiched between physical actions (e.g. picking up steak from the table, sitting on a chair, or using the cash register). For each context, I extract a list of all word sequences that occur in at least five games. These sequences are then used to encode lines of chat text in the log files into <i>signatures </i>that can be quickly compared with chat text input at runtime.</div>
<div class="MsoNormal">
<br />
The chat bot encodes the user’s text input as a signature, searches for the best matching signatures in 5,000 log files, and then simply outputs the next line from the conversation with the best matching signature. Ties are broken arbitrarily, and there is a history bias to favor conversations that contain previous lines that match the history of the interaction currently in progress. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I want to stress that this chat bot was just a quick experiment, and is <b>not </b>the system I am using in my more recent work. <i>However</i>, this simple implementation turned out to be quite a compelling (and entertaining) way to explore the data, and to illustrate how far you can get without any semantics, just relying on surface text similarities, if you have lots of data. The results are not perfect – not going to pass the Turing test, but compared to other chat bots, it’s not doing so bad, especially considering it requires zero hand-authoring -- just turn the crank and get a chat bot. I never formally evaluated the chat system because it was just an exploration, but some implementation details are covered in this <a href="http://www.aaai.org/ocs/index.php/FSS/FSS10/paper/view/2212/2766">AAAI symposium paper</a>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I ended up demo-ing this system for years at presentations and lab events. Running it is a bit like Russian roulette. It often works well and impresses users, but there is always the chance of an incorrect, or even obscene, response. The data is full of witty responses and handles a number of cases that I would not have covered if authoring the dialogue by hand. Below are some highlights.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Highlights</b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Here is one example of the bot’s wit. In all examples, the bot's response is in <b>bold</b>:</div>
<blockquote>
<div class="MsoNormal" style="color: purple;">
WAITRESS: Welcome to the restaurant</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: Hi</b></div>
<div class="MsoNormal" style="color: purple;">
WAITRESS: hi, how many in your party?</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: a table for one please.</b></div>
<div class="MsoNormal" style="color: purple;">
WAITRESS: smoking or non?</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: WOW, you have a smoking section?</b></div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: I don’t smoke, but I will start tonight</b></div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: take me there!</b></div>
</blockquote>
<div class="MsoNormal">
And another:</div>
<blockquote>
<div class="MsoNormal" style="color: purple;">
WAITRESS: hi</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: hi</b></div>
<div class="MsoNormal" style="color: purple;">
WAITRESS: do you have a reservation?</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: I have personal reservations</b></div>
<div class="MsoNormal">
<b style="color: blue;">CUSTOMER</b><b><span style="color: blue;">: but I wave them when it comes to lust. </span></b></div>
</blockquote>
<div class="MsoNormal">
When interacting with a Japanese visitor to the lab, we were both surprised when the bot corrected his English:</div>
<blockquote>
<div class="MsoNormal" style="color: purple;">
WAITRESS: welcome</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: table for one please</b></div>
<div class="MsoNormal" style="color: purple;">
WAITRESS: have a sit</div>
<div class="MsoNormal">
<b style="color: blue;">CUSTOMER</b><b><span style="color: blue;">: *seat </span></b></div>
</blockquote>
<div class="MsoNormal">
Sometimes the response is not quite right, but still entertaining – better than no response at all. (Note that in this case, the bot is playing as the waitress):</div>
<blockquote>
<div class="MsoNormal" style="color: blue;">
CUSTOMER: you have nice eyes</div>
<div class="MsoNormal" style="color: purple;">
<b>WAITRESS: thanks! :) I work out</b></div>
</blockquote>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Edge Cases</b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The most interesting thing about this data-driven approach is seeing the edge cases that get picked up. When I noticed that one of the files of extracted phrases looked suspiciously small, and investigated which context it was associated with, I found that the system had learned to say “oops” when the waitress dropped something on the floor. So what, right? That’s obviously what you should say when you drop something. Well, what makes this interesting is that the user interface actually does not allow the players to put things down on the floor – they can only put things down on furniture and other objects. Dropping things on the floor occurs as the result of a ray-casting bug, when trying to place something on a table. In this case, the AI system has learned an appropriate response for something the designer never realized could even happen!</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Another edge case is related to a fruit bowl in the back of the kitchen. In many games, decorative props like these would simply be ignored by the AI. In contrast, the chat system learns that a waitress should say “on the house” when putting the fruit down on a customer’s table. One of the down sides of learning from recurring patterns of text is that the system fails to pick up lines that might be gems, but are only observed once in thousands of games. For example, in one game the customer responds “Damn girl, that is serious fruitage!” when the waitress puts the fruit bowl on his table. More recent work is looking at ways to capture these gems by including a human in the loop of the data-mining process.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Colorful Interactions (to say the least)</b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As mentioned earlier, the chat bot does have the potential to offend people. At a presentation in Plano, Texas, the bot offered Paul Tozour a lap dance. When demo-ing at the lab for the VP of a major American corporation, she was surprised to see this:</div>
<blockquote>
<div class="MsoNormal" style="background-color: white; color: purple;">
WAITRESS: How may I help you?</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: Get me a table b<span style="font-size: large;">****</span></b></div>
</blockquote>
<div class="MsoNormal">
I have to admit getting some satisfaction from a demo for a gray-haired reporter who told me my research didn’t make any sense to anyone over the age of 40, that went like this: </div>
<blockquote>
<div class="MsoNormal" style="color: purple;">
WAITRESS: welcome</div>
<div class="MsoNormal" style="color: blue;">
<b>CUSTOMER: shut up wh<span style="font-size: large;">***</span></b></div>
</blockquote>
<div class="MsoNormal">
I don’t think he actually noticed the bot’s response – couldn’t read it through his bi-focals. Obviously we can censor obscene words, but where’s your sense of adventure? Players can still say some pretty bad things without swearing. Microsoft learned this the hard way when they released a <a href="http://www.theregister.co.uk/2007/12/03/santa_filth_outrage/">potty mouthed santa clause bot</a>.</div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Interactive Demo!</b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I wanted to embed the chat bot applet into this blog post, but Java’s security regime defeated me, and I had to put on its own page. <a href="http://jorkin.com/applet">You can find the demo applet here</a>.</div>
<div class="MsoNormal">
<br /></div>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com0tag:blogger.com,1999:blog-8221447444699368366.post-20901721610653107492011-01-15T20:59:00.000-05:002013-02-24T13:40:03.182-05:00First post. w00t!<div class="MsoNormal">
People of Earth:</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-align: left;">
Welcome to my research blog! I’m in the home stretch for the PhD, aiming to graduate in about a year (plus or minus a few months), and will use this blog to share my progress, and recap some earlier developments. I’m hoping that blogging will help me collect my thoughts for the defense (and help me figure out where all the time went).</div>
<div>
</div>
<div class="MsoNormal" style="text-align: left;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3VyBkNpFtyyHL0hpaxjYjI46nRi4lOAPtmVtiaTjw4zjn59k9nepqEFsbuazgSKqyv6BuLJixpngXTdUJLjFPCXYzNCFjOMxGY2B1iOnQKhEhKpS9t6JU2os2JnQYVCd-9fHoVzG2eps/s1600/RG.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3VyBkNpFtyyHL0hpaxjYjI46nRi4lOAPtmVtiaTjw4zjn59k9nepqEFsbuazgSKqyv6BuLJixpngXTdUJLjFPCXYzNCFjOMxGY2B1iOnQKhEhKpS9t6JU2os2JnQYVCd-9fHoVzG2eps/s320/RG.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Screenshot from The Restaurant Game</td></tr>
</tbody></table>
<div class="MsoNormal" style="text-align: left;">
I launched <a href="http://therestaurantgame.net/">The Restaurant Game</a> in early 2007, and proposed to generate character behavior and dialogue from log files of human-human interactions recorded online for my Masters thesis. Needless to say, I grossly underestimated how hard it would be to automate behavior and dialogue from unconstrained online interactions (at least, to do it well enough for use in a ‘real’ game), and I’m still working on this for my PhD.</div>
<div>
</div>
<div class="MsoNormal" style="text-align: left;">
<br /></div>
<div class="MsoNormal" style="text-align: left;">
It’s been a pretty crazy project, as you might expect when you invite the general public on the internet to participate in your research. </div>
<div class="MsoNormal">
<br />
More after the jump...<br />
<br /></div>
<div class="MsoNormal">
<b><span style="font-size: small;"></span><span style="font-size: small;"></span></b><br />
<a name='more'></a><b><span style="font-size: small;">Recap Part 1: Data Collection</span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
I really had no idea if anyone would play the game, and if they did, would they do anything reasonable? I hoped to collect 1,000 transcripts of completed two-player games, which I reached within two weeks, so I raised my goal to 10,000, which I finally reached in 2010. Obviously a commercial game could easily collect this much data in week, or even a day. The biggest spikes in traffic came from mentions on <a href="http://ve3d.ign.com/articles/news/day/2007-03-03">Voodoo Extreme</a> and <a href="http://kotaku.com/243674/new-game-learns-from-players-programs-new-game">Kotaku</a>, and (strangely spontaneously) years later on <a href="http://hplusmagazine.com/articles/ai/games-design-themselves">h+ Magazine</a> and <a href="http://games.slashdot.org/story/09/07/30/1610250/Games-That-Design-Themselves">Slashdot</a>. Here is the <a href="http://web.media.mit.edu/%7Ejorkin/blog/Analytics_DashboardReport.pdf">google analytics report</a> of traffic to the project web page between 2007 and 2010.</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhD8U8D6gZoCfoji6lLGkjLFRTRTI6QAKeKmJhgoxYBUIN0IJuAKINAQygG9tPh4XGFKJpy03qVzbZKIcggmInW1ferU1olrCkaweMFKQM2dyTjzhkeSFi5SoZ8Y3g3K2iYWUiniVnOOXQ/s1600/pies.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhD8U8D6gZoCfoji6lLGkjLFRTRTI6QAKeKmJhgoxYBUIN0IJuAKINAQygG9tPh4XGFKJpy03qVzbZKIcggmInW1ferU1olrCkaweMFKQM2dyTjzhkeSFi5SoZ8Y3g3K2iYWUiniVnOOXQ/s1600/pies.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">A staircase composed of 53 cherry cheesecakes.</td></tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
The anonymous players on the internet did not disappoint – they did all kinds of things in the game. Some stacked 53 cherry cheesecakes to form a staircase, and climbed onto the roof of the restaurant. Other players punk'd a <a href="http://web.media.mit.edu/%7Ejorkin/blog/Ira_Flatow_science_friday_edit.mp3">reporter from NPR</a>. Hackers in Sweden figured out how to run their own servers, and began modding the game. They introduced naked customers, allowed more than two players, and eventually built a two-story restaurant, and (of course) added guns.</div>
<div class="MsoNormal">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihVNRKDfRpq3K4yMctwIBunFhCxToL_AmBhKmh94Io9bOgX9AnyapoEFzoDj-NoDKCXcDBDkOkKUTORz6DvsUDDQjh_eOjH2d-wvRKK_mzvG0pi_OPZ4mrTNUW7iTE6DNrxOnEWoaqpf4/s1600/aaahb.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihVNRKDfRpq3K4yMctwIBunFhCxToL_AmBhKmh94Io9bOgX9AnyapoEFzoDj-NoDKCXcDBDkOkKUTORz6DvsUDDQjh_eOjH2d-wvRKK_mzvG0pi_OPZ4mrTNUW7iTE6DNrxOnEWoaqpf4/s1600/aaahb.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Swedish hackers mod The Restaurant Game.</td></tr>
</tbody></table>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
People misbehaving online was not surprising. What <i>was</i> surprising was that over half of our players did mostly normal restaurant things – eating food, serving drinks, reading menus, paying bills, etc. For my <a href="http://www.media.mit.edu/cogmac/publications/orkin_mastersthesis_2007.pdf">Master's thesis</a> (summarized in this <a href="http://www.media.mit.edu/cogmac/publications/Orkin_JoGD07_inpress.pdf">journal article</a>), I used data from 5,000 games to learn a statistical model of language and behavior that could be used to estimate the typicality of any gameplay session (or fragment of a game). I had humans rate the typicality of 300 games based on what one would expect to observe in a <u>real</u> restaurant (in terms of both dialogue and physical interaction), and demonstrated that the estimates of the statistical model for the same games correlated strongly with human judgment. So for the thesis deadline in 2007, I ran out of time before I was able to close the loop and actually generate interactive behavior from the data, but even recognition has interesting implications for games. (Future posts will describe more recent work, where I do generate dialogue and behavior from the data).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
What good is behavior recognition? Characters could use statistical models learned from gameplay data to detect someone behaving atypically, or to guide their own behavior to blend in with the crowd -- kind of a social variant on stealth gameplay (e.g. <a href="http://en.wikipedia.org/wiki/Thief:_The_Dark_Project">Thief</a>, <a href="http://en.wikipedia.org/wiki/The_Operative:_No_One_Lives_Forever">NOLF</a>). It’s interesting to see that there is already a new genre of multiplayer stealth games emerging, where humans try to blend in with AI in <a href="http://www.gamasutra.com/view/feature/6242/the_reasons_behind_spyparty.php">Spy Party</a> and the multiplayer mode of <a href="http://www.gamespot.com/xbox360/action/assassinscreedbrotherhood/review.html">Assassin’s Creed: Brotherhood</a>. Statistical models would allow this type of gameplay in single-player games, where the AI is trying to detect the human. Obviously, the AI already <i>knows </i>who the human is, but this is no different than when we simulate vision and hearing to enable sensory honesty in stealth games. Think of it as a means of enabling social sensory honesty.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
That’s it for now. Future recap posts will get into simulating interaction from data. Stay tuned! Or even subscribe.</div>
Jeff Orkinhttp://www.blogger.com/profile/12486391956727436411noreply@blogger.com6