Beyond Greedy Item Ranking
We’ve learned to rank items by relevance scores:
Item 1: score = 0.95 ✓
Item 2: score = 0.92 ✓
Item 3: score = 0.89 ✓
...
Item 10: score = 0.75 ✓
But is this optimal?
Alice just finished watching:
The Fellowship of the Ring (2001)
Standard ranking gives her:





Optimizing each item independently ≠ optimal list
Slate: A set of items presented together
\[ S = [i_1, i_2, \ldots, i_k] \]
New goal: Maximize utility of the entire slate
\[ \arg\max_{S} U[S \mid \text{user}] \]
instead of
\[ \arg\max_{S} \sum_{i \in S} \text{score}(i \mid \text{user}) \]
A slate is a collection of items shown to a user at once:
User sees all items simultaneously (or nearly so)
What should U[S | user] capture?
Idea: Items should be different from each other
\[ \text{Diversity}(S) = -\sum_{i,j \in S} \text{similarity}(i, j) \]
or
\[ \text{Diversity}(S) = \frac{1}{|S|} \sum_{i \in S} \text{distance}(i, \text{centroid}(S)) \]
Exploitation: Show items we’re confident user will like
Exploration: Show items to learn user preferences
Balance: Cannot be done in isolation
Goal: Represent the breadth of the catalog
\[ \text{Coverage}(S) = |\{\text{categories represented in } S\}| \]
Utility function as weighted sum:
U[S | user] = α·Relevance(S)
+ β·Diversity(S)
+ γ·Exploration(S)
+ δ·Coverage(S)
Recall Alice just watched The Fellowship of the Ring
Slate-optimized (relevance + diversity + exploration):





Problem: Finding optimal slate is combinatorial
|catalog| = 10,000 items
|slate| = 10 items
Possible slates = C(10000, 10) ≈ 2.6 × 10^36
We need to be smarter!
1. Retrieval (Candidate Generation)
2. Ranking (Scoring)
3. Re-ranking (Slate Optimization)
Reinforcement Learning formulation:
SlateQ by Ie et al. (2019):
Question: If we can rank slates, why not generate optimal slates directly?
Seq2Slate (Bello et al., 2018):
So far: Optimize slate from candidate pool
Next: Generate the entire recommendation surface
→ See slides/06_generative_pages.qmd