Bayesian Preference Estimation with Inconsistent Feedback

Header Image

Manuscript submitted for review to the DECISION Journal

This website contains access to data, code, and results for the paper “Bayesian Preference Estimation with Inconsistent Feedback,” submitted to the journal of Mathematical Psychology.

To foster reproducibility, we have made all code and results publicly available in our repository : Repository link

Datasets

There are three datasets used in this paper with the following statistics:

Dataset Number of Users Number of Items Number of Interactions Number of Categories % Sparsity
Yelp 6,651 8,267 264,174 265 99.55%
MovieLens1M 6,040 3,260 998,538 18 94.93%
MovieLensSmall 610 8,974 100,010 19 98.17%

Sample data snapshot - from MovieLens dataset:

UserID movieID Like Status Primary Genre Secondary Genre Movie Name
54462 288 0 Adventure Comedy The Princess Bride
20948 247 0 Action Adventure Indiana Jones
25386 401 0 Action Adventure Mad Max: Fury Road
48678 491 1 Comedy Drama The Truman Show
42717 322 1 Crime Drama The Godfather
52687 143 0 Adventure Drama Life of Pi
66403 218 0 Action Adventure Gladiator
41867 130 1 Drama Thriller Se7en
10548 69 0 Action Drama Inception
58166 33 0 Action Comedy Guardians of the Galaxy

From Yelp Dataset:

UserID LocationID Like Status Type of Location City
155 442 0 Cinema San Francisco
1734 6273 1 Cinema New York
231 3004 1 Restaurant Los Angeles
787 3475 1 Shopping Center San Francisco
2293 1857 1 Cinema San Francisco
1643 364 0 Restaurant Chicago
565 2669 0 Restaurant San Francisco
5013 6758 0 Shopping Center Los Angeles
3500 429 0 Shopping Center Chicago
605 4223 1 Restaurant Los Angeles

Process Data Download Instruction

The raw data for this work can be downloaded from this link, separated by train, test, and category datasets.

Three folders are included, each folder contain data for one of the datasets (folders are MovieLens1M, MovieLensSmall, Yelp)

Models Evaluated

In this research, we evaluate the performance of our proposed model by comparing it against several well-known baseline models. These models include advanced machine learning techniques and established Bayesian methods. Unlike our proposed model, these baselines do not explicitly account for user inconsistency.

1. Bayesian Updating Model

This model employs Bayesian inference to update the parameters of a Beta distribution for each user-attribute pair based on binary feedback (e.g., like/dislike).

2. Bayesian Personalized Ranking (BPR)

BPR is designed for optimizing ranking tasks with implicit feedback, such as clicks or purchases, rather than explicit ratings.

3. Collaborative Filtering Model

This model predicts user ratings by leveraging ratings from similar users and items, combining information from multiple sources.

4. Multi-Attribute Utility Model

This model estimates user preferences by calculating the utility of items based on multiple attributes.

The MostPop model recommends items based on their overall popularity, serving as a baseline for comparison.

Notebooks

For simplicity and reproduciblitiy, three easy-to-run notebooks are provided:

1. Model_implementation.ipynb

This notebook demonstrates the implementation of all baseline models. It creates simulated data and runs all baseline and proposed models on this simulated data. Additionally, it includes ablation studies on item scarcities, user scarcities, and users with varying degrees of inconsistencies.

2. Results.ipynb

This notebook utilizes the saved and uploaded results from all simulated datasets as well as the three real-world datasets. It analyzes the results by plotting different statistics and outcomes of the proposed model and baseline models.

3. Main.ipynb

This notebook combines the entire process, including model implementation, running the model on both simulated and real-world datasets, and performing evaluations.

Results

Results on real-data can be seen below:

Metrics are calculated based on the top-10 predictions in the test set. The best-performing model for each metric is highlighted in bold, and the second-best is italicized.

MovieLens 1M

Model Precision@10 Recall@10 NDCG@10 F1@10
Bayesian Model 1.2434 0.6991 5.0201 0.8950
Collaborative Filtering 3.2814 1.9516 10.1953 2.4475
Multi-Attribute Utility 0.8642 0.4642 3.5332 0.6039
BPR Model 0.5315 0.2795 2.3449 0.3663
Most Popular Model 1.4520 0.6706 5.8587 0.9175
Mixture Bayesian Model 1.6803 0.9418 6.7846 1.2070

MovieLens 1M Dataset Results

Yelp

Model Precision@10 Recall@10 NDCG@10 F1@10
Bayesian Model 0.0120 0.0872 0.0547 0.0211
Collaborative Filtering 0.0030 0.0060 0.0137 0.0040
Multi-Attribute Utility 0.0015 0.0150 0.0068 0.0027
BPR Model 0.0030 0.0105 0.0137 0.0047
Most Popular Model 0.0015 0.0021 0.0068 0.0018
Mixture Bayesian Model 0.0158 0.1142 0.0719 0.0278

Yelp Dataset Results

MovieLens Small

Model Precision@10 Recall@10 NDCG@10 F1@10
Bayesian Model 0.2874 0.1609 1.2000 0.2063
Collaborative Filtering 0.0328 0.0086 0.1490 0.0136
Multi-Attribute Utility 0.2787 0.2283 1.2662 0.2510
BPR Model 0.1639 0.2101 0.6872 0.1842
Most Popular Model 0.3683 0.2178 1.4661 0.2738
Mixture Bayesian Model 0.3915 0.2140 1.6312 0.2767

MovieLens Small Dataset Results

Recent Posts