我有这两个DF
主动:
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
7 | 333 | 5.0
7 | 444 | 3.0
用户:
Customer_ID | product_No| Rating
9 | 111 | 2.0
9 | 222 | 5.0
9 | 666 | 5.0
9 | 555 | 3.0
我想找到两个用户评定的常见产品(例如111,222)的评级,并删除任何不常见的产品(例如444,333,555,666)。所以新的DF应该是这样的:
主动:
Customer_ID | product_No| Rating
7 | 111 | 3.0
7 | 222 | 1.0
用户:
Customer_ID | product_No| Rating
9 | 111 | 2.0
9 | 222 | 5.0
我不知道如何在没有for循环的情况下这样做。你能帮帮我吗
这是我到目前为止的代码:
import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]
答案 0 :(得分:4)
您可以先使用设置交集获取公共product_No
,然后使用isin
方法过滤原始数据框:
common_product = set(active.product_No).intersection(user.product_No)
common_product
# {111, 222}
active[active.product_No.isin(common_product)]
#Customer_ID product_No Rating
#0 7 111 3.0
#1 7 222 1.0
user[user.product_No.isin(common_product)]
#Customer_ID product_No Rating
#0 9 111 2.0
#1 9 222 5.0
答案 1 :(得分:1)
使用query
引用其他数据框
Active.query('product_No in @User.product_No')
Customer_ID product_No Rating
0 7 111 3.0
1 7 222 1.0
User.query('product_No in @Active.product_No')
Customer_ID product_No Rating
0 9 111 2.0
1 9 222 5.0
答案 2 :(得分:0)
我使用WebView
尝试了以下内容:
INNER JOIN
它提供以下输出:
import pandas as pd
df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2
df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij
df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
df_e.columns = list(df1)
df_list.append(df_e)
print df_list[0]
print df_list[1]
# print df1
Customer_ID product_No Rating
0 7 111 3
1 7 222 1
2 7 333 5
3 7 444 3
# print df2
Customer_ID product_No Rating
0 9 111 2
1 9 222 5
2 9 777 5
3 9 555 3
# print the INNER JOINed df
Customer_ID_x product_No Rating_x Customer_ID_y Rating_y
0 7 111 3 9 2
1 7 222 1 9 5
# print the first df you want, with common 'product_No'
Customer_ID product_No Rating
0 7 111 3
1 7 222 1
# print the second df you want, with common 'product_No'
Customer_ID product_No Rating
0 9 111 2
1 9 222 5
选择每个inner join
中的公共行。由于存在公用列名,因此对于未在连接中使用的列,已加入的df
添加了后缀以区分这些列名。然后,您只需指定适当的后缀即可简单地提取列以获得所需的最终结果。
df
here有一个很好的例子。
答案 3 :(得分:0)
你对这个问题的回答是....
import pandas as pd
dict1={"Customer_id":[7,7,7,7],
"Product_No":[111,222,333,444],
"rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
"Product_No":[111,222,666,555],
"rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)