Andrea Stocco
University of Washington
Seattle, WA
PSYCH448D, Week 5Models of Memory /1
Summary of RL
RL is a general framework that explains learning
The V-values and Q-values are a form of memory
There are multiple memory systems in the brain
Different types of memory systems
Seger, 2005 (adapted from Squire & Zola-Morgan, 1992)
Different types of memory systems: RL
Seger, 2005 (adapted from Squire & Zola-Morgan, 1992)
Memory systems in the brain
Some questions to start the day
What is memory?
What are the characteristics of human memory?
What are the limitations of human memory?
What is memory?
Characteristic of memory
● It starts with an
event○ The hand press
● Event leaves a
trace○ The handprint
● Time will make the
trace fade away○ The wind
● More events will
make the trace
deeper
Rational Analysis (i.e., Bayesian framework)
● Influential branch of mathematical models
● Assumptions○ Assumes agents are adapted to their environment
○ Evolution has already done the job of optimizing the agent
● Consequence○ If you know the environment and the agent’s goals, you can
mathematically derive what the organism will do
● Advantage○ Specification of the agent is minimal
A timeline of Bayesian models of memory
John R.
Anderson
(1990s)
Nick Chater
(2000s)
Thomas
Griffiths
(2010s)
* Has since taken a
different approach
A rational approach to memory
● Memory has evolved to efficiently retrieve
information○ Suppose that retrieval of information took constant time T
○ Very frequent memories take the same time as very infrequent
ones
○ Then computational resources are not well allocated!
○ Useful only when T ~ 0!
● Cost-effective approach: for each memory trace
M, we can consider its need probability, p(M)
● Then, retrieval time should be T ~ 1/p(M)○ That is, resources spent to make M available should be
proportional to M’s usefulness.
A rational theory of memory retrieval
It is useful to think of rational analysis as an
economic theory
● Memories are useful to achieves goals
● Each goals has a value, V○ Like the sum of rewards Rt in RL
● But memory retrieval has a cost, C
● Thus net value of retrieving memory M is p(M) *
V – C
A rational theory of forgetting
● Forgetting is not bad
● When would it be good to forget something at all?
● Theory: retrieving memories has a cost C:○ Example costs: Attention, time to pause, etc.
● When p(M) * V < C, it is rational to forget M!
Interim summary
● Agent’s behavior can be analyzed as a function of
the environment
● We assume agent is optimal
● We assume that agent is rational (minimizes costs)
● Memory decay and forgetting can be seen as
rational
Anderson and Milson (1989)
● Laid out the principles of rational analysis
● Considered memory as an information retrieval
system○ That is, a database system
● The system retrieves records in response to
queries Q
● Applied Bayesian principles to derive optimal
behavior○ What an ideal system should do
● Bayesian approaches to cognition are now
ubiquitous
● Note that:
p(A & B) = p(A) ✕ p(B | A)
p(A & B) = p(B) ✕ p(A | B)
● We can establish equality
p(B) ✕ p(A | B) = p(A) ✕ p(B | A)
p(A | B) = [p(A) ✕ P(B | A)] / p(B)
Bayes theorem
p(A)p(B)
p(A & B)
p(A) p(B | A)
p(B)
p(A | B)
Bayes theorem: Nomenclature
p(A | B) =
p(A) ✕ p(B | A)
p(B)
Posterior probability Prior probability Likelihood
Marginal Likelihood(“model evidence”)
Bayes theorem: More nomenclature
Posterior probability of A
p(A | B) =
p(A) ✕ p(B | A)
p(B)
Prior probabilityof A
Likelihood of B
Marginal probability of B
(“model evidence”)
Prior probability of A
p(B | A)
p(B)
= p(A) ✕
Support for B
Bayes theorem: Meaning
p(A | B) =
p(A) ✕ p(B | A)
p(B)
Posterior probability
p(B | A)
p(B)
p(A) ✕
Prior probability
ContextualFactors
● Contextual factors = Q (for “query”, like in
database)
● The context Q is made of several cue elements qi
● Each cue q contributes to the probability of
needing A.
Memory as information system
p(A | Q) =
Posterior probability
p(Q | A)
p(Q)
p(A) ✕
Prior probability
ContextualFactors
● For ease, Anderson and Milson work in terms of
odds, not probabilities.
● The math changes a little
● Values range is now [0, ∞], instead of [0, 1]
From probabilities to odds
p(A | Q)
p(Q | A)p(Q|
¬A)
p(A) p(¬A)
p(¬A | Q)
=✕
● Priori probabilities reflect the history of A
independent of any contextual factors.
● How do we characterize the history of A?
Prior probabilities
p(A | Q)
p(Q | A)p(Q|
¬A)
p(A) p(¬A)
p(¬A | Q)
= ✕
Calculating priors
Factors that affect priors:
● The importance of a memory
● The time since a memory has been created○ The more time passes, the more we have evidence that an
event is unlikely
○ Example: waiting times at customer service
● Anderson uses Burrell’s (1980) model of library
book rentals:
● Where○ n is the number of times since a has been created.
○ t is the time since A’s creation
○ r( t) is decay over time, think of λ t with 0 < λ < 1 in RL
○ v and b are initial parameters (you can think of them as mean
uses and mean age of all memories)
Calculating priors: The book rental problem
p(A) p(¬A)
n + v t + b
r(t)
=
Contextual factors
p(A | Q)
p(Q | A)p(Q|
¬A)
p(A) p(¬A)
p(¬A | Q)
= *
● Contextual factors are broken down into individual
cues qi
Breaking down contextual factors
p(A | Q)
p(Q | A)p(Q|
¬A)
p(A) p(¬A)
p(¬A | Q)
= *
p(Q | A)p(Q|
¬A)
p(qi |
A)p(qi |
¬A)
=
Πi
● Contextual factors are broken down into individual
cues qi
● As the number of memories increases, p(qi | ¬A) ➡
p(qi )
Simplifying contextual factors
p(Q | A)p(Q|
¬A)
p(qi |
A)p(qi |
¬A)
=
Πi
p(Q | A)p(Q|
¬A)
p(qi |
A)p(qi
)
=
Πi
● Interpretation is simple: probability of finding qi
whenever A is present…
● … Over the absolute probability of finding qi
● Can be easily calculated from discrete corpora○ E.g., language corpora, with qi and A being word in the same
sentence.
Interpreting contextual factors
p(Q | A)p(Q|
¬A)
p(qi |
A)p(qi
)
=
Πi
Godden & Baddeley, 1975
Summary: Rational analysis
● Retrieval probabilities = need function =
posterior probability of need memory A given
context Q
● Used economic considerations (costs and
values) and Bayesian inference to infer laws for
priors and likelihood
● Priors reflect history of use, dominated by
frequency (n) and recency (t).
● Likelihood reflects contextual cues q; each cue q
adds in proportion to its previous co-occurrence
with A
Is human memory truly rational?
● In 1989 and 1990, Anderson published several
papers outlining his conclusions
● In 1991, he set out to find an empirical test.
● If human memory is rational, it should adapt to the
human environment○ Again, almost no assumptions about the agent
Ebbinghaus (1885)
● First experimental dataset of
human memory
○ Used non-sensical
syllables
○ Measured how long it
took to relearn
(savings) at different
delays
● Classic result in psychology
● Typically interpreted as
effect of decay of memory
Exponential or power decay?
Regular form
Exponential law:
P = α-βT
Power law:
P = αT -β
Log transform
Exponential law
log(P) = -βT * log(α)
Vs. Power law
log(P) = log(α) – β log(T)
Power law is linear when
time T and probability P
are in logs!
● Remember that we are
estimating odds.
● When we retransform
them in probabilities
(need function), results
show a power function
between p(A) and t○ p(A) = αTß
● The log/log plot is linear
Estimating need probabilities
Exponential or power law?
Log transform
Exponential law
log(P) = -βT * log(α)
Vs. Power law
log(P) = log(α) – β
log(T)
We can estimate parameters:
α = 3.86, β = 0.13
✅
🚫
Is human memory truly rational? (Reprise)
● In 1989 and 1990, Anderson published several
papers outlining his conclusions
● In 1991, he set out to find an empirical test.
● If human memory is rational, it should adapt to the
human environment○ Environment ⇔ Human memory
● Therefore, the same parameters that govern
human memory should be found in the
environment.
Anderson & Schooler, 1991
Tested three modern sources of information
1. 100 days of New York Times headlines in 1990
2. 100 days of Child-directed speech (CHILDES
database)
3. 100 days of emails received by John Anderson in 1990
The same power function function and the same
parameters should hold across domains
● (Remember: Ebbinghaus parameters (α = 3.86, β = 0.13) were in
probabilities, not odds!)
Anderson & Schooler, 1991: Results
How about social media?
Anderson et al, in press.
Discussion question
What do you think of Bayesian approaches?
What are the limits of Bayesian approaches?
- PSYCH448D, Week 5 Models of Memory /1
- Summary of RL
- Different types of memory systems
- Different types of memory systems: RL
- Memory systems in the brain
- Some questions to start the day
- What is memory?
- Characteristic of memory
- Rational Analysis (i.e., Bayesian framework)
- A timeline of Bayesian models of memory
- A rational approach to memory
- A rational theory of memory retrieval
- A rational theory of forgetting
- Interim summary
- Anderson and Milson (1989)
- Bayes theorem
- Bayes theorem: Nomenclature
- Bayes theorem: More nomenclature
- Bayes theorem: Meaning
- Memory as information system
- From probabilities to odds
- Prior probabilities
- Calculating priors
- Calculating priors: The book rental problem
- Contextual factors
- Breaking down contextual factors
- Simplifying contextual factors
- Interpreting contextual factors
- Godden & Baddeley, 1975
- Summary: Rational analysis
- Is human memory truly rational?
- Ebbinghaus (1885)
- Exponential or power decay?
- Estimating need probabilities
- Exponential or power law?
- Is human memory truly rational? (Reprise)
- Anderson & Schooler, 1991
- Anderson & Schooler, 1991: Results
- How about social media?
- Discussion question