从具有不常见列值

时间:2017-04-16 00:40:37

标签: python pandas numpy

我有这两个DF

主动:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0
7           | 333       | 5.0
7           | 444       | 3.0

用户:

Customer_ID | product_No| Rating
9           | 111       | 2.0
9           | 222       | 5.0
9           | 666       | 5.0
9           | 555       | 3.0

我想找到两个用户评定的常见产品(例如111,222)的评级,并删除任何不常见的产品(例如444,333,555,666)。所以新的DF应该是这样的:

主动:

Customer_ID | product_No| Rating
7           | 111       | 3.0
7           | 222       | 1.0

用户:

Customer_ID | product_No| Rating
9           | 111       | 2.0
9           | 222       | 5.0

我不知道如何在没有for循环的情况下这样做。你能帮帮我吗

这是我到目前为止的代码:

import pandas as pd
ratings = pd.read_csv("ratings.csv",names['Customer_ID','product_No','Rating'])
active=ratings[ratings['UserID']==7]
user=ratings[ratings['UserID']==9]

4 个答案:

答案 0 :(得分:4)

您可以先使用设置交集获取公共product_No,然后使用isin方法过滤原始数据框:

common_product = set(active.product_No).intersection(user.product_No)

common_product
# {111, 222}

active[active.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         7          111      3.0
#1         7          222      1.0

user[user.product_No.isin(common_product)]

#Customer_ID   product_No   Rating
#0         9          111      2.0
#1         9          222      5.0

答案 1 :(得分:1)

使用query引用其他数据框

Active.query('product_No in @User.product_No')

   Customer_ID  product_No  Rating
0            7         111     3.0
1            7         222     1.0

User.query('product_No in @Active.product_No')

   Customer_ID  product_No  Rating
0            9         111     2.0
1            9         222     5.0

答案 2 :(得分:0)

我使用WebView尝试了以下内容:

INNER JOIN

它提供以下输出:

import pandas as pd

df1 = pd.read_csv('a.csv')
df2 = pd.read_csv('b.csv')
print df1
print df2

df_ij = pd.merge(df1, df2, on='product_No', how='inner')
print df_ij

df_list = []
for df_e,suffx in zip([df1,df2],['_x','_y']):
    df_e = df_ij[['Customer_ID'+suffx,'product_No','Rating'+suffx]]
    df_e.columns = list(df1)
    df_list.append(df_e)

print df_list[0]
print df_list[1]

# print df1 Customer_ID product_No Rating 0 7 111 3 1 7 222 1 2 7 333 5 3 7 444 3 # print df2 Customer_ID product_No Rating 0 9 111 2 1 9 222 5 2 9 777 5 3 9 555 3 # print the INNER JOINed df Customer_ID_x product_No Rating_x Customer_ID_y Rating_y 0 7 111 3 9 2 1 7 222 1 9 5 # print the first df you want, with common 'product_No' Customer_ID product_No Rating 0 7 111 3 1 7 222 1 # print the second df you want, with common 'product_No' Customer_ID product_No Rating 0 9 111 2 1 9 222 5 选择每个inner join中的公共行。由于存在公用列名,因此对于未在连接中使用的列,已加入的df添加了后缀以区分这些列名。然后,您只需指定适当的后缀即可简单地提取列以获得所需的最终结果。

df here有一个很好的例子。

答案 3 :(得分:0)

你对这个问题的回答是....

import pandas as pd
dict1={"Customer_id":[7,7,7,7],
      "Product_No":[111,222,333,444],
      "rating":[3.0,1.0,5.0,3.0]}
active=pd.DataFrame(dict1)
dict2={"Customer_id":[9,9,9,9],
      "Product_No":[111,222,666,555],
      "rating":[2.0,5.0,5.0,3.0]}
user=pd.DataFrame(dict2)
df3=pd.merge(active,user,on="Product_No",how="inner")
df3
active=df3[["Customer_id_x","Product_No","rating_x"]]
print(active)
user=df3[["Customer_id_y","Product_No","rating_y"]]
print(user)