我需要将数据帧df_original中的“评级”列(通过键,“ userId”和“ movieId”)与数据帧df_workspace结合起来。
>数据框df_workspace
userId movieId cluster
0 1 2 2
1 1 29 2
2 1 260 2
3 1 589 2
4 1 653 2
5 1 919 2
6 1 1009 2
7 1 1196 2
8 1 1198 2
9 1 1200 2
10 1 1201 2
11 1 1291 2
12 1 1304 2
13 1 1374 2
14 1 1525 2
15 1 1750 2
16 1 1920 2
17 1 1967 2
18 1 2021 2
19 1 2138 2
20 1 2140 2
21 1 2143 2
22 1 2173 2
23 1 2193 2
24 1 2628 2
25 1 2761 2
26 1 2872 2
27 1 3000 2
28 1 3030 2
29 1 3037 2
>数据框df_original
userId movieId title \
0 1 2 Jumanji (1995)
1 1 29 City of Lost Children, The (Cité des enfants ...
2 1 32 Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
3 1 47 Seven (a.k.a. Se7en) (1995)
4 1 50 Usual Suspects, The (1995)
5 1 112 Rumble in the Bronx (Hont faan kui) (1995)
6 1 151 Rob Roy (1995)
7 1 223 Clerks (1994)
8 1 253 Interview with the Vampire: The Vampire Chroni...
9 1 260 Star Wars: Episode IV - A New Hope (1977)
genres rating timestamp
0 Adventure|Children|Fantasy 3.5 2005-04-02 23:53:47.000
1 Adventure|Drama|Fantasy|Mystery|Sci-Fi 3.5 2005-04-02 23:31:16.000
2 Mystery|Sci-Fi|Thriller 3.5 2005-04-02 23:33:39.000
3 Mystery|Thriller 3.5 2005-04-02 23:32:07.000
4 Crime|Mystery|Thriller 3.5 2005-04-02 23:29:40.000
5 Action|Adventure|Comedy|Crime 3.5 2004-09-10 03:09:00.000
6 Action|Drama|Romance|War 4.0 2004-09-10 03:08:54.000
7 Comedy 4.0 2005-04-02 23:46:13.000
8 Drama|Horror 4.0 2005-04-02 23:35:40.000
9 Action|Adventure|Sci-Fi 4.0 2005-04-02 23:33:46.000
>输出示例
userId movieId cluster rating
0 1 2 2 3.5
1 1 29 2 4.0
2 1 260 2 3.5
3 1 589 2 2.0
4 1 653 2 5.0
5 1 919 2 4.5
我尝试使用join,但我不知道如何使用多个键。
答案 0 :(得分:0)
尝试一下:
df_output = df_original.merge(df_workspace, how='inner', on=['userId', 'movieId'])
还有一个join方法,但是我更喜欢合并
答案 1 :(得分:0)
尝试:
df_workspace.merge(df_original[['userId','movieId','rating']])
默认情况下, merge
在所有标记为相同的列上联接。而且,通过过滤df_orginal数据框列,您只会得到所需的输出列。