Abstract
Recommender systems predict and suggest relevant options to users in various domains, such as e-commerce, streaming services, and social media. Recently, deep reinforcement learning (DRL)-based recommendation systems have become increasingly popular in academics and industry since DRL can characterize the long-term interaction between the system and users to achieve a better recommendation experience, e.g., Netflix, Spotify, Google, and YouTube.
This paper demonstrates that an adversary can manipulate the DRL-based recommender system by injecting carefully designed user-system interaction records. The poisoning attack against the DRL-based recommender system is formulated as a non-convex integer programming problem. To solve the problem, we proposed a three-phase mechanism (called PARL) to maximize the hit ratio (the proportion of recommendations that result in actual user interactions, such as clicks, purchases, or other relevant actions) while avoiding easy detection. The core idea of PARL is to improve the ranking of the target item while fixing the rankings of other items. Considering the sequential decision-making characteristics of DRL, PARL rearranges the items' order of the fake users to mimic the normal users' sequential features, an aspect usually overlooked in existing work. Our experiments on three real-world datasets demonstrate the effectiveness of PARL and better concealment against the detection techniques. PARL is open-sourced at https://github.com/PARL-RS/PARL.
This paper demonstrates that an adversary can manipulate the DRL-based recommender system by injecting carefully designed user-system interaction records. The poisoning attack against the DRL-based recommender system is formulated as a non-convex integer programming problem. To solve the problem, we proposed a three-phase mechanism (called PARL) to maximize the hit ratio (the proportion of recommendations that result in actual user interactions, such as clicks, purchases, or other relevant actions) while avoiding easy detection. The core idea of PARL is to improve the ranking of the target item while fixing the rankings of other items. Considering the sequential decision-making characteristics of DRL, PARL rearranges the items' order of the fake users to mimic the normal users' sequential features, an aspect usually overlooked in existing work. Our experiments on three real-world datasets demonstrate the effectiveness of PARL and better concealment against the detection techniques. PARL is open-sourced at https://github.com/PARL-RS/PARL.
Original language | English |
---|---|
Title of host publication | ASIA CCS '24 |
Subtitle of host publication | Proceedings of the 19th ACM Asia Conference on Computer and Communications Security |
Publisher | ACM |
Pages | 1331-1344 |
Number of pages | 14 |
ISBN (Electronic) | 9798400704826 |
DOIs | |
Publication status | Published - 2024 |
Funding
We express our gratitude to the anonymous reviewers for their valuable feedback. This work was partly supported by the NSFC under Grants 61833015, 62293511, 62088101, 52161135201, 62103371, the Fundamental Research Funds for the Central Universities 226-2023-00111, 226-2024-00004. Min Chen was sponsored by the Helmholtz Association within the project \u201CTrustworthy Federated Data Analytics\u201D (TFDA) (No. ZT-I-OO1 4).
Funders | Funder number |
---|---|
Helmholtz Association | |
TFDA | |
National Natural Science Foundation of China | 62088101, 62293511, 61833015, 62103371, 52161135201 |
National Natural Science Foundation of China | |
Fundamental Research Funds for the Central Universities | 226-2023-00111, 226-2024-00004 |
Fundamental Research Funds for the Central Universities |