Part IV.a: Reranking and Slate Optimization

Beyond Greedy Item Ranking

The Problem with Greedy Ranking

We’ve learned to rank items by relevance scores:

Item 1: score = 0.95  ✓
Item 2: score = 0.92  ✓
Item 3: score = 0.89  ✓
...
Item 10: score = 0.75 ✓

But is this optimal?

A Motivating Example

Alice just finished watching:

The Fellowship of the Ring (2001)

Standard ranking gives her:

What’s Wrong Here?

✓ Individually relevant: Each item is similar to what Alice watched
✗ Lack of diversity: All LOTR/Tolkien content
✗ No exploration: What if Alice wants something different?
✗ Saturation: Diminishing returns after item 2-3

Optimizing each item independently ≠ optimal list

From Items to Slates

Slate: A set of items presented together

\[ S = [i_1, i_2, \ldots, i_k] \]

New goal: Maximize utility of the entire slate

\[ \arg\max_{S} U[S \mid \text{user}] \]

instead of

\[ \arg\max_{S} \sum_{i \in S} \text{score}(i \mid \text{user}) \]

What is a Slate?

A slate is a collection of items shown to a user at once:

Homepage: 10 items in “Recommended for You”
Search results: Top 20 results
Email digest: 5 movies this week
Carousel: 8 items in “Because you watched X”

User sees all items simultaneously (or nearly so)

Utility Function Components

What should U[S | user] capture?

Relevance: Items match user preferences
Diversity: Items cover different aspects
Exploration: Include some novel/uncertain items
Coverage: Represent different genres/categories
Fairness: Don’t over-represent popular items
Business objectives: Promote new releases, monetize

Diversity

Idea: Items should be different from each other

\[ \text{Diversity}(S) = -\sum_{i,j \in S} \text{similarity}(i, j) \]

\[ \text{Diversity}(S) = \frac{1}{|S|} \sum_{i \in S} \text{distance}(i, \text{centroid}(S)) \]

Exploration vs. Exploitation

Exploitation: Show items we’re confident user will like

Exploration: Show items to learn user preferences

Balance: Cannot be done in isolation

Coverage

Goal: Represent the breadth of the catalog

\[ \text{Coverage}(S) = |\{\text{categories represented in } S\}| \]

Putting It Together

Utility function as weighted sum:

U[S | user] = α·Relevance(S)
            + β·Diversity(S)
            + γ·Exploration(S)
            + δ·Coverage(S)

Better Example for Alice

Recall Alice just watched The Fellowship of the Ring

Slate-optimized (relevance + diversity + exploration):

Slate Optimization

Problem: Finding optimal slate is combinatorial

|catalog| = 10,000 items
|slate| = 10 items

Possible slates = C(10000, 10) ≈ 2.6 × 10^36

We need to be smarter!

Three-stage Pipeline

1. Retrieval (Candidate Generation)

Fast, approximate search over millions of items

2. Ranking (Scoring)

Score and rank the candidates by relevance

3. Re-ranking (Slate Optimization)

Optimize the final slate for multiple objectives

Slate Scoring with RL

Reinforcement Learning formulation:

State: User features, context
Action: Select slate S
Reward: User engagement (clicks, watch time)

SlateQ by Ie et al. (2019):

Decompose slate value into item contributions

Slate Generation

Question: If we can rank slates, why not generate optimal slates directly?

Seq2Slate (Bello et al., 2018):

Encoder-Decoder RNN
Sequential Generation
Reinforcement Learning

From Optimization to Generation

So far: Optimize slate from candidate pool

Next: Generate the entire recommendation surface

→ See slides/06_generative_pages.qmd

Key Takeaways

Greedy ranking is suboptimal
Utility functions with multiple objectives
Leverage reinforcement learning

Bello, I., Kulkarni, S., Jain, S., Boutilier, C., Chi, E., Eban, E., Luo, X., Mackey, A., & Meshi, O. (2018). Seq2Slate: Re-ranking and slate optimization with RNNs. arXiv Preprint arXiv:1810.02019. https://arxiv.org/abs/1810.02019

Ie, E., Hsu, C., Mladenov, M., Jain, V., Narvekar, S., Cheng, J., Choi, H.-T., & Boutilier, C. (2019). Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv Preprint arXiv:1905.12767. https://arxiv.org/abs/1905.12767