Part II.b: Transformers for Recommendation

Sequential Modeling with Self-Attention

Motivation: Sequences Matter

Users consume items in sequences with temporal dynamics

Traditional methods ignore:

Order of interactions
Recency effects
Short-term intent

How can we apply transformers to a sequence of items?

Applying GPT-Style models

Predict next item

SASRec (Kang & McAuley, 2018)

Self-Attentive Sequential Recommendation
Causal masking prevents “seeing the future”
Cross-entropy loss over item vocabulary

 Input:   [i₁]  [i₂]  [i₃]  [i₄]
           ↓     ↓     ↓     ↓
         causal attention (masked)
           ↓     ↓     ↓     ↓
 Predict: [i₂]  [i₃]  [i₄]  [i₅]

Applying BERT-Style models

Estimate user embedding

BERT4Rec (Sun et al., 2019)

Randomly mask items during training
Can attend to both past AND future items
Learns richer user representation from bidirectional context

Input:    [cls]  [i₁]  [MASK]  [i₃]
           ↓      ↓     ↓       ↓
           bi-directional attention
           ↓      ↓     ↓       ↓
Predict:  [emb]  [?]   [i₂]    [?]

Let’s implement SASRec

Open notebooks/04_sasrec.qmd

Kang, W.-C., & McAuley, J. (2018). Self-attentive sequential recommendation. 2018 IEEE International Conference on Data Mining (ICDM), 197–206. https://doi.org/10.1109/ICDM.2018.00035

Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., & Jiang, P. (2019). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1441–1450. https://doi.org/10.1145/3357384.3357895