You are on page 1of 47

IMAGE: DREAMSTIME.

COM

Francis Li

A Parallel Implementation of Cue-based Retrieval for Episodic Memory


SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING April 2014

What is episodic memory?


> >

Long-term store of specic events Functionally


> > > >
>

Architectural Automatic Autonoetic Temporally Indexed


Derbinsky, N.: Efciently Implementing Episodic Memory (2009)
IMAGE: COMMONS>WIKIPEDIA.ORG

University of Adelaide

What is episodic memory?


> >

Where did I park my car?

Long-term store of specic events Functionally


> > > >
>

Architectural Automatic Autonoetic Temporally Indexed


Derbinsky, N.: Efciently Implementing Episodic Memory (2009)
IMAGE: COMMONS>WIKIPEDIA.ORG

University of Adelaide

Episodic Memory enables


>

Deliberation based on rich knowledge scheme Playback the past Learning from mistakes Much more

> > >

University of Adelaide

Episodic Memory enables


>

Deliberation based on rich knowledge scheme Playback the past Learning from mistakes Much more

> > >

University of Adelaide

Episodic Memory enables


>

Deliberation based on rich knowledge scheme Playback the past Learning from mistakes Much more

> > >

IMAGE: IMEMACHINEAECNET.WORDPRESS.COM

University of Adelaide

Cognitive Architecture
>

Episodic Memory is part of a bigger picture Combine knowledge and architectural processes Allow creation of AI agents Rich knowledge schema Want agents to persist for long time frames
University of Adelaide 4

>

> > >

Episodic Memory Block Diagram


> >

Knowledge represented as directed, labelled, acyclic graphs Three distinct memory operations
April 2014 5

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

Encoding
Entire Working Memory structure stored in episodic store
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide April 2014 6

Retrieval
> >

Cue structure is placed on to working memory Cue is some substructure of working memory which must be found in the database of graphs
April 2014 7

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

Reconstruction
> >

Given an episode id to restrict from the cue matcher Episode is decoded from store and placed on retrieval structure in working memory
April 2014 8

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

No Dynamics
> >

Episodic memory untouched once in the store Episodes are stored without deliberation
April 2014 9

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

The problem is
>

Massive database of episodes


> >

Must store an episode after every decision Store size only increases!

>

Query based retrieval is computationally complex


> >

Graph matching runs in non deterministic polynomial time Cannot meet 50ms reactivity requirement!

>

Reconstruction is cheap
University of Adelaide 10

Dealing with it
> >

Make two assumptions about the environment Temporal contiguity


> >

Not many changes between episodes Environment is fairly stable, things that change are small relative to the total complexity of the environment

>

Structural regularity
>

The number of unique structures is much less than the total number of structures experienced We can reuse older structures that have disappeared, and may reappear
11

>

University of Adelaide

Exploiting assumptions
>

Temporal contiguity
>

Store changes between episodes, not entire episodes

>

Structural regularity
> >

Only store unique structures Use indexing to reuse structures and symbols
12

University of Adelaide

Computationally speaking
>

Encoding is now efcient


>

Derbinsky, N.: Efciently Implementing Episodic Memory (2009)

Although, storage is unbounded as the store is only increasing

> >

Reconstruction is straight forward Retrieval is the real problem!


>

Will still be unbounded, need to deal with it


13

University of Adelaide

Lets talk about retrieval


Current Implementation in Soar

What is a cue?
>

Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>

> >

>

Root-to-leaf-node path

University of Adelaide

15

Root

What is a cue?
>

Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>

> >

>

Root-to-leaf-node path

University of Adelaide

15

Root

What is a cue?
>

Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>

> >

>

Root-to-leaf-node path Leaf node Leaf node

University of Adelaide

15

Root

What is a cue?
>

Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>

> >

>

Root-to-leaf-node path Leaf node Leaf node

University of Adelaide

15

Root

What is a cue?
>

Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>

> >

>

Root-to-leaf-node path Leaf node Leaf node

University of Adelaide

15

Root

What is a cue?
>

Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>

> >

>

Root-to-leaf-node path Leaf node Leaf node

University of Adelaide

15

What episode do we return?


> > >

Ideally do a graph match Graph matching is computationally complex So lter out candidates that wont graph match
>

Reduce search space

>

Allow partial matches


>

Get most similar episode

>

Break ties based on recency


University of Adelaide 16

A two-stage process
> >

Determine candidate episodes Look at the number of matching surface features If they are all present, do a graph match Else, return the one with the greatest match score Doesnt check for structure!
University of Adelaide 17

>

>

>

How does the surface match work?


>

Decompose episodes into boolean statements representing path to leaf nodes Process intervals On each interval endpoint check for satisfaction of these statements Update and track satisfaction in structure called DNFGraph
University of Adelaide 18

> >

>

Intervals
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide April 2014 19

How does the surface match work?


>

Decompose cue into boolean statements representing path to leaf nodes Process intervals Check satisfaction of boolean statements by processing intervals going backwards in time If we nd all statements are satised, do a graph match If graph match successful return, else keep processing intervals Return highest match score
University of Adelaide 20

> >

> > >

The problem
> >

Still computationally unbounded! Retrieval time is dependant on how many intervals are processed before nding the episode Non-deterministic retrieval time

>

University of Adelaide

21

Focus on surface match


> >

Majority of applications will not need a full graph match Need a better solution that does not depend on linear interval processing Want deterministic retrievals
>

>

Even at the expense of slower typical retrieval times!

University of Adelaide

22

My solution
> >

Convert into a matrix multiplication problem Matrix multiplication is suited for parallel algorithms by nature (think Matlab) How do we represent intervals, complex graphs and the cue in the matrix?

>

University of Adelaide

23

The answer: We dont. Sort of.


>

For retrieval we only care about all root to leaf node paths from the cue So, store all distinct path root to leaf node paths as we come across them Build this incrementally just like everything else
(root, 1, 3, 6, 4) (root, 1, 3, 7, 5) (root, 1, 4, 8, 5) (root, 1, 4, 9, 5)

>

>

etc

University of Adelaide

24

So back to the matrix


> >

Distinct root-to-leaf-node paths

Rows represent interval endpoints Columns represent unique root-toleaf-node paths Cell represents whether path is present for that endpoint Cue is decomposed into root-to-leafnode paths and stored as a vector The dot product results in a vector representing the match score for each interval endpoint
University of Adelaide 25

>

>

>

Interval endpoints

Extra data structures


> >

This replaces the DNFGraph We store an extra hash table which relates distinct root-to-leaf-node paths to the matrix column number
>

Keyed on tuple representing the path


> >

(root, map, square, x, 4) (root, map, square, y, 5)

>

We also must incrementally update the matrix itself


University of Adelaide 26

Some Optimisations
> >

Matrix will be sparse Sparse matrices are a heavily researched problem


>

Off-the-shelf solutions available

> > >

We can convert matrix to sparse representation when needed Adds overhead, but worth it More testing is required
University of Adelaide 27

Wont this take up a lot of bytes?


> >

Yes But
>

Matrix doesnt have to hold 64 bit integers. Just has to be greater than the number of cue features Number of distinct root-to-leaf-node paths should stabilise over time

> >

More testing is required


>

On the scale of millions of episodes


28

University of Adelaide

So, what overheads have been added


>

During encoding
> >

Updating hash table Resizing matrix


>

Adding row for new episode Adding columns for new root-to-leaf-node paths

>

University of Adelaide

29

So, what overheads have been added


>

During retrieval
>

Decomposition of cue into paths Conversion to sparse matrix Dot product calculation DNFGraph removed

> > >

University of Adelaide

30

Testing: Word Sense Disambiguation


>

Determine the meaning of an ambiguous word given its part of speech


>

Part of speech is whether it is a noun, pronoun, verb, etc. For example: I want to go fast (speed? or without food?) Put the pizza on the plate

>

>

IMAGE: NAVIGLINLP.BLOGSPOT.COM.AU

University of Adelaide

31

Testing: Word Sense Disambiguation


>

Expose agent to corpus of sentences


!

SemCor corpus is already tagged with sense

> > >

Let agent try to disambiguate Receives feedback from the input link Can use episodic memory to access this feedback
University of Adelaide 32

Results: Episodes vs Retrieval Time


#!" '#" '!" &#" &!" %#" %!" $#" $!" #" !" !" #!!!" $!!!!" $#!!!" ()*+,*-." ()*+,/*0" /*1+20" (3*+(4,/*1+20"

>

Not much difference at all


University of Adelaide 33

Results: Episodes vs Hash Table Size


> >

Has O(n) complexity


,*%%%%#

!"#$%&'$()*+(,$(-.$/(0.12'(3#4'()5+(
,%%%%%# )*%%%%# )%%%%%# +*%%%%# +%%%%%# (*%%%%# (%%%%%# *%%%%# %# %# +%%%# ,%%%# -%%%# .%%%# (%%%%# (+%%%# (,%%%# (-%%%# !"#$#%&'''()#

For this application, new episode most likely means new distinct path May be different for other applications

>

University of Adelaide

34

Thoughts
>

Difference between sparse and dense matrix retrieval times are minimal
>

Likely dependent on the application

>

Episodes vs Hash Table Size has O(n) complexity (for this application)
> >

Will probably become a problem for extremely long living agents In this case, matrix rows are on episodes rather than intervals

>

This application is a special case, there are more unique paths than usual
University of Adelaide 35

Thoughts
>

Implemented using Python scripting language


>

Using language closer to machine level may result in an order of magnitude improvement

>

Computational complexity of matrix based retrieval does not seem much better
> >

But matrix operations are an important research problem Solutions exist to accelerate operations
36

University of Adelaide

Whats next?
>

Test on a wider variety of environments Test with cues of different complexity Millions of episodes? Memory dynamics? Forgetting? Acceleration using GPU?
IMAGE: COMMONS>WIKIPEDIA.ORG

>

> > >

University of Adelaide

37

Pros
>

Cons
>

Suitable with off the shell hardware solutions Scalable with larger episode stores Deterministic retrievals, dont depend on where match is
University of Adelaide

More computational overhead when encoding each episode More storage overhead when encoding each episode Storage may become a problem Typical cue time retrieval may be worse
38

>

>

>

>

>

Thank you!
Questions?

You might also like