Part I.b: EASE - Embarrassingly Shallow Autoencoders

A Simple Yet Powerful Baseline for Collaborative Filtering

From Matrix Factorization to EASE

We just learned: Matrix Factorization decomposes \(R \approx U \times V^T\)

But what if we could skip the factorization?

EASE (Steck, 2019): Learn item-item similarities directly

  • No iterations
  • No neural networks
  • Closed-form solution!

The EASE Idea

Classic autoencoder: Learn to reconstruct input through bottleneck

\[\text{Input} \rightarrow \text{Encoder} \rightarrow \text{Bottleneck} \rightarrow \text{Decoder} \rightarrow \text{Output}\]

EASE: Skip the bottleneck! Learn reconstruction weights directly

\[X \approx X \cdot B\]

where \(B\) is an item-item similarity matrix

Constraint: \(\text{diag}(B) = 0\) (can’t predict item from itself)

Intuition: Item-Item Similarities

How does \(\hat{X} = X \cdot B\) make recommendations?

Users: Alice watched [Toy Story], Bob watched [Godfather, Die Hard]

Item-item similarity matrix B (learned from all users):

Toy Story Godfather Die Hard Inside Out Heat
Toy Story 0.0 0.01 0.02 0.18 0.01
Godfather 0.01 0.0 0.08 0.02 0.15
Die Hard 0.02 0.08 0.0 0.01 0.12

Alice: Inside Out = 0.18 ✅ | Heat = 0.01 ❌

Bob: Heat = 0.15 + 0.12 = 0.27 ✅ | Inside Out = 0.02 + 0.01 = 0.03 ❌

Let’s Build EASE!

EASE Strengths & Weaknesses

Strengths

  • Simple: Closed-form solution
  • Fast: No iterations needed
  • Strong baseline: Competitive with complex models
  • Interpretable: Item-item similarities

Weaknesses

  • Memory: Dense \(B\) matrix (O(n²) for n items)
  • Cold start: Can’t handle new items without retraining
  • No sequences: Doesn’t model temporal patterns

Bottom line: Excellent baseline, but not the final answer!

EASE in Practice

When to use EASE:

  • As a strong baseline to compare against
  • For medium-scale item catalogs (< 100K items)
  • When you need fast training and simple deployment
  • For implicit feedback data (clicks, views, purchases)

Real-world impact:

Many production systems use EASE as:

  1. Initial baseline before investing in complex models
  2. Fallback when neural models fail
  3. Component in ensemble systems

References

Steck, H. (2019). Embarrassingly shallow autoencoders for sparse data. The World Wide Web Conference, 3251–3257. https://doi.org/10.1145/3308558.3313710