Part II.b: Transformers for Recommendation

Sequential Modeling with Self-Attention

Motivation: Sequences Matter

Users consume items in sequences with temporal dynamics

Traditional methods ignore:

  • Order of interactions
  • Recency effects
  • Short-term intent

How can we apply transformers to a sequence of items?

Applying GPT-Style models

Predict next item

SASRec (Kang & McAuley, 2018)

  • Self-Attentive Sequential Recommendation
  • Causal masking prevents “seeing the future”
  • Cross-entropy loss over item vocabulary
 Input:   [i₁]  [i₂]  [i₃]  [i₄]
           ↓     ↓     ↓     ↓
         causal attention (masked)
           ↓     ↓     ↓     ↓
 Predict: [i₂]  [i₃]  [i₄]  [i₅]

Applying BERT-Style models

Estimate user embedding

BERT4Rec (Sun et al., 2019)

  • Randomly mask items during training
  • Can attend to both past AND future items
  • Learns richer user representation from bidirectional context
Input:    [cls]  [i₁]  [MASK]  [i₃]
           ↓      ↓     ↓       ↓
           bi-directional attention
           ↓      ↓     ↓       ↓
Predict:  [emb]  [?]   [i₂]    [?]

Let’s implement SASRec

Open notebooks/04_sasrec.qmd

Kang, W.-C., & McAuley, J. (2018). Self-attentive sequential recommendation. 2018 IEEE International Conference on Data Mining (ICDM), 197–206. https://doi.org/10.1109/ICDM.2018.00035
Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., & Jiang, P. (2019). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1441–1450. https://doi.org/10.1145/3357384.3357895