A Parallel Implementation of Cue-Based Retrieval For Episodic Memory

IMAGE: DREAMSTIME.
COM
Francis Li
A Parallel Implementation of Cue-based Retrieval for Episodic Memory

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING April 2014
What is episodic memory?

> >
Long-term store of specic events Functionally

> > > >
>
Architectural Automatic Autonoetic Temporally Indexed

Derbinsky, N.: Efciently Implementing Episodic Memory (2009)
IMAGE: COMMONS>WIKIPEDIA.ORG
University of Adelaide
What is episodic memory?

> >
Where did I park my car?
Long-term store of specic events Functionally

> > > >
>
Architectural Automatic Autonoetic Temporally Indexed

Episodic Memory enables

>
Deliberation based on rich knowledge scheme Playback the past Learning from mistakes Much more
> > >

>
> > >

>
> > >
IMAGE: IMEMACHINEAECNET.WORDPRESS.COM
Cognitive Architecture
>
Episodic Memory is part of a bigger picture Combine knowledge and architectural processes Allow creation of AI agents Rich knowledge schema Want agents to persist for long time frames
University of Adelaide 4
>
> > >
Episodic Memory Block Diagram

> >
Knowledge represented as directed, labelled, acyclic graphs Three distinct memory operations
April 2014 5
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide
Encoding
Entire Working Memory structure stored in episodic store
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide April 2014 6
Retrieval
> >
Cue structure is placed on to working memory Cue is some substructure of working memory which must be found in the database of graphs
April 2014 7
Reconstruction
> >
Given an episode id to restrict from the cue matcher Episode is decoded from store and placed on retrieval structure in working memory
April 2014 8
No Dynamics
> >
Episodic memory untouched once in the store Episodes are stored without deliberation
April 2014 9
The problem is
>
Massive database of episodes

> >
Must store an episode after every decision Store size only increases!
>
Query based retrieval is computationally complex

> >
Graph matching runs in non deterministic polynomial time Cannot meet 50ms reactivity requirement!
>
Reconstruction is cheap
Dealing with it
> >
Make two assumptions about the environment Temporal contiguity

> >
Not many changes between episodes Environment is fairly stable, things that change are small relative to the total complexity of the environment
>
Structural regularity
>
The number of unique structures is much less than the total number of structures experienced We can reuse older structures that have disappeared, and may reappear
11
>
Exploiting assumptions
>
Temporal contiguity
>
Store changes between episodes, not entire episodes
>
Structural regularity
> >
Only store unique structures Use indexing to reuse structures and symbols
12
Computationally speaking
>
Encoding is now efcient

>
Although, storage is unbounded as the store is only increasing
> >
Reconstruction is straight forward Retrieval is the real problem!

>
Will still be unbounded, need to deal with it

13
Lets talk about retrieval

Current Implementation in Soar
What is a cue?
>
Contains a number of symbolic features Directed, acyclic graph Some substructure of working memory What are features?
>
> >
>
Root-to-leaf-node path
15
Root
What is a cue?
>
>
> >
>
Root-to-leaf-node path
15
Root
What is a cue?
>
>
> >
>
Root-to-leaf-node path Leaf node Leaf node
15
Root
What is a cue?
>
>
> >
>
15
Root
What is a cue?
>
>
> >
>
15
Root
What is a cue?
>
>
> >
>
15
What episode do we return?

> > >
Ideally do a graph match Graph matching is computationally complex So lter out candidates that wont graph match
>
Reduce search space
>
Allow partial matches

>
Get most similar episode
>
Break ties based on recency

A two-stage process
> >
Determine candidate episodes Look at the number of matching surface features If they are all present, do a graph match Else, return the one with the greatest match score Doesnt check for structure!
>
>
>
How does the surface match work?

>
Decompose episodes into boolean statements representing path to leaf nodes Process intervals On each interval endpoint check for satisfaction of these statements Update and track satisfaction in structure called DNFGraph
> >
>
Intervals
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide April 2014 19
How does the surface match work?

>
Decompose cue into boolean statements representing path to leaf nodes Process intervals Check satisfaction of boolean statements by processing intervals going backwards in time If we nd all statements are satised, do a graph match If graph match successful return, else keep processing intervals Return highest match score
> >
> > >
The problem
> >
Still computationally unbounded! Retrieval time is dependant on how many intervals are processed before nding the episode Non-deterministic retrieval time
>
21
Focus on surface match

> >
Majority of applications will not need a full graph match Need a better solution that does not depend on linear interval processing Want deterministic retrievals
>
>
Even at the expense of slower typical retrieval times!
22
My solution
> >
Convert into a matrix multiplication problem Matrix multiplication is suited for parallel algorithms by nature (think Matlab) How do we represent intervals, complex graphs and the cue in the matrix?
>
23
The answer: We dont. Sort of.

>
For retrieval we only care about all root to leaf node paths from the cue So, store all distinct path root to leaf node paths as we come across them Build this incrementally just like everything else
(root, 1, 3, 6, 4) (root, 1, 3, 7, 5) (root, 1, 4, 8, 5) (root, 1, 4, 9, 5)
>
>
etc
24
So back to the matrix

> >
Distinct root-to-leaf-node paths
Rows represent interval endpoints Columns represent unique root-toleaf-node paths Cell represents whether path is present for that endpoint Cue is decomposed into root-to-leafnode paths and stored as a vector The dot product results in a vector representing the match score for each interval endpoint
>
>
>
Interval endpoints
Extra data structures

> >
This replaces the DNFGraph We store an extra hash table which relates distinct root-to-leaf-node paths to the matrix column number
>
Keyed on tuple representing the path

> >
(root, map, square, x, 4) (root, map, square, y, 5)
>
We also must incrementally update the matrix itself

Some Optimisations
> >
Matrix will be sparse Sparse matrices are a heavily researched problem

>
Off-the-shelf solutions available
> > >
We can convert matrix to sparse representation when needed Adds overhead, but worth it More testing is required
Wont this take up a lot of bytes?

> >
Yes But
>
Matrix doesnt have to hold 64 bit integers. Just has to be greater than the number of cue features Number of distinct root-to-leaf-node paths should stabilise over time
> >
More testing is required

>
On the scale of millions of episodes

28
So, what overheads have been added

>
During encoding
> >
Updating hash table Resizing matrix

>
Adding row for new episode Adding columns for new root-to-leaf-node paths
>
29
So, what overheads have been added

>
During retrieval
>
Decomposition of cue into paths Conversion to sparse matrix Dot product calculation DNFGraph removed
> > >
30
Testing: Word Sense Disambiguation

>
Determine the meaning of an ambiguous word given its part of speech

>
Part of speech is whether it is a noun, pronoun, verb, etc. For example: I want to go fast (speed? or without food?) Put the pizza on the plate
>
>
IMAGE: NAVIGLINLP.BLOGSPOT.COM.AU
31
Testing: Word Sense Disambiguation

>
Expose agent to corpus of sentences

!
SemCor corpus is already tagged with sense
> > >
Let agent try to disambiguate Receives feedback from the input link Can use episodic memory to access this feedback
Results: Episodes vs Retrieval Time

#!" '#" '!" &#" &!" %#" %!" $#" $!" #" !" !" #!!!" $!!!!" $#!!!" ()*+,*-." ()*+,/*0" /*1+20" (3*+(4,/*1+20"
>
Not much difference at all

Results: Episodes vs Hash Table Size

> >
Has O(n) complexity

,*%%%%#
!"#$%&'$()*+(,$(-.$/(0.12'(3#4'()5+(
,%%%%%# )*%%%%# )%%%%%# +*%%%%# +%%%%%# (*%%%%# (%%%%%# *%%%%# %# %# +%%%# ,%%%# -%%%# .%%%# (%%%%# (+%%%# (,%%%# (-%%%# !"#$#%&'''()#
For this application, new episode most likely means new distinct path May be different for other applications
>
34
Thoughts
>
Difference between sparse and dense matrix retrieval times are minimal
>
Likely dependent on the application
>
Episodes vs Hash Table Size has O(n) complexity (for this application)
> >
Will probably become a problem for extremely long living agents In this case, matrix rows are on episodes rather than intervals
>
This application is a special case, there are more unique paths than usual
Thoughts
>
Implemented using Python scripting language

>
Using language closer to machine level may result in an order of magnitude improvement
>
Computational complexity of matrix based retrieval does not seem much better
> >
But matrix operations are an important research problem Solutions exist to accelerate operations
36
Whats next?
>
Test on a wider variety of environments Test with cues of different complexity Millions of episodes? Memory dynamics? Forgetting? Acceleration using GPU?
>
> > >
37
Pros
>
Cons
>
Suitable with off the shell hardware solutions Scalable with larger episode stores Deterministic retrievals, dont depend on where match is
More computational overhead when encoding each episode More storage overhead when encoding each episode Storage may become a problem Typical cue time retrieval may be worse
38
>
>
>
>
>
Thank you!
Questions?

A Parallel Implementation of Cue-Based Retrieval For Episodic Memory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Parallel Implementation of Cue-Based Retrieval For Episodic Memory

Uploaded by

Copyright:

Available Formats

IMAGE: DREAMSTIME.

A Parallel Implementation of Cue-based Retrieval for Episodic Memory

What is episodic memory?

Long-term store of specic events Functionally

Architectural Automatic Autonoetic Temporally Indexed

What is episodic memory?

Where did I park my car?

Long-term store of specic events Functionally

Architectural Automatic Autonoetic Temporally Indexed

Episodic Memory enables

> > >

Episodic Memory enables

> > >

Episodic Memory enables

> > >

> > >

Episodic Memory Block Diagram

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING University of Adelaide

Massive database of episodes

Query based retrieval is computationally complex

Make two assumptions about the environment Temporal contiguity

Store changes between episodes, not entire episodes

Encoding is now efcient

Derbinsky, N.: Efciently Implementing Episodic Memory (2009)

Although, storage is unbounded as the store is only increasing

Reconstruction is straight forward Retrieval is the real problem!

Will still be unbounded, need to deal with it

Lets talk about retrieval

Root-to-leaf-node path Leaf node Leaf node

Root-to-leaf-node path Leaf node Leaf node

Root-to-leaf-node path Leaf node Leaf node

Root-to-leaf-node path Leaf node Leaf node

What episode do we return?

Reduce search space

Allow partial matches

Get most similar episode

Break ties based on recency

How does the surface match work?

How does the surface match work?

> > >

Focus on surface match

Even at the expense of slower typical retrieval times!

The answer: We dont. Sort of.

So back to the matrix

Distinct root-to-leaf-node paths

Extra data structures

Keyed on tuple representing the path

(root, map, square, x, 4) (root, map, square, y, 5)

We also must incrementally update the matrix itself

Matrix will be sparse Sparse matrices are a heavily researched problem

Off-the-shelf solutions available

> > >

Wont this take up a lot of bytes?

More testing is required

On the scale of millions of episodes

So, what overheads have been added

Updating hash table Resizing matrix

So, what overheads have been added

> > >