如何通过一个以上的键连接两个数据框?

时间:2018-09-22 20:45:01

标签: python pandas dataframe

我需要将数据帧df_original中的“评级”列(通过键,“ userId”和“ movieId”)与数据帧df_workspace结合起来。

>数据框df_workspace

    userId  movieId  cluster
0         1        2        2
1         1       29        2
2         1      260        2
3         1      589        2
4         1      653        2
5         1      919        2
6         1     1009        2
7         1     1196        2
8         1     1198        2
9         1     1200        2
10        1     1201        2
11        1     1291        2
12        1     1304        2
13        1     1374        2
14        1     1525        2
15        1     1750        2
16        1     1920        2
17        1     1967        2
18        1     2021        2
19        1     2138        2
20        1     2140        2
21        1     2143        2
22        1     2173        2
23        1     2193        2
24        1     2628        2
25        1     2761        2
26        1     2872        2
27        1     3000        2
28        1     3030        2
29        1     3037        2

>数据框df_original

   userId  movieId                                              title  \
0       1        2                                     Jumanji (1995)   
1       1       29  City of Lost Children, The (Cité des enfants ...   
2       1       32          Twelve Monkeys (a.k.a. 12 Monkeys) (1995)   
3       1       47                        Seven (a.k.a. Se7en) (1995)   
4       1       50                         Usual Suspects, The (1995)   
5       1      112         Rumble in the Bronx (Hont faan kui) (1995)   
6       1      151                                     Rob Roy (1995)   
7       1      223                                      Clerks (1994)   
8       1      253  Interview with the Vampire: The Vampire Chroni...   
9       1      260          Star Wars: Episode IV - A New Hope (1977)   

                                   genres  rating                timestamp  
0              Adventure|Children|Fantasy     3.5  2005-04-02 23:53:47.000  
1  Adventure|Drama|Fantasy|Mystery|Sci-Fi     3.5  2005-04-02 23:31:16.000  
2                 Mystery|Sci-Fi|Thriller     3.5  2005-04-02 23:33:39.000  
3                        Mystery|Thriller     3.5  2005-04-02 23:32:07.000  
4                  Crime|Mystery|Thriller     3.5  2005-04-02 23:29:40.000  
5           Action|Adventure|Comedy|Crime     3.5  2004-09-10 03:09:00.000  
6                Action|Drama|Romance|War     4.0  2004-09-10 03:08:54.000  
7                                  Comedy     4.0  2005-04-02 23:46:13.000  
8                            Drama|Horror     4.0  2005-04-02 23:35:40.000  
9                 Action|Adventure|Sci-Fi     4.0  2005-04-02 23:33:46.000 

>输出示例

    userId  movieId  cluster   rating
0         1        2        2   3.5
1         1       29        2   4.0
2         1      260        2   3.5
3         1      589        2   2.0
4         1      653        2   5.0
5         1      919        2   4.5

我尝试使用join,但我不知道如何使用多个键。

2 个答案:

答案 0 :(得分:0)

尝试一下:

df_output = df_original.merge(df_workspace, how='inner', on=['userId', 'movieId'])

还有一个join方法,但是我更喜欢合并

答案 1 :(得分:0)

尝试:

df_workspace.merge(df_original[['userId','movieId','rating']])
默认情况下,

merge在所有标记为相同的列上联接。而且,通过过滤df_orginal数据框列,您只会得到所需的输出列。