Python / Pandas - 检查列的值是否与其他列值相同

时间:2017-11-15 08:35:54

标签: python pandas dataframe data-analysis

我有这样的DataFrame:

        product_id          dt  products_qty  stock_qty
0          8225  2017-10-16         12.000    13.000
1          8280  2017-10-16         0.000     11.000
2          8225  2017-10-17         0.000     41.000
3          8280  2017-10-17         7.134     64.698
4          8225  2017-10-18         1.000      8.000
5          8280  2017-10-18         2.728     27.417
6          8225  2017-10-19         0.000     41.000
7          8280  2017-10-19         1.000     -2.000
8          8225  2017-10-20         2.000     -7.000
9          8280  2017-10-20         1.000     25.000
10         8225  2017-10-21         0.000     41.000
11         8280  2017-10-21         0.000     11.000

我必须得到products_qty等于0且stock_qty值相同的所有行。所以在这种情况下,我应该像这样得到DataFrame:

            product_id          dt  products_qty  stock_qty
    0          8280  2017-10-16         0.000     11.000
    2          8225  2017-10-17         0.000     41.000
    6          8225  2017-10-19         0.000     41.000
    10         8225  2017-10-21         0.000     41.000
    11         8280  2017-10-21         0.000     11.000

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

使用带有链式条件的boolean indexing - 带参数keep=False的{​​{3}},以检查所有重复项:

print (df)
    product_id          dt  products_qty  stock_qty
0      2948225  2017-10-16        12.000     13.000
1      2948280  2017-10-16         0.000     11.000
2      2948225  2017-10-17         0.000     41.000
3      2948280  2017-10-17         7.134     64.698
4      2948225  2017-10-18         1.000      8.000
5      2948280  2017-10-18         2.728     27.417
6      2948225  2017-10-19         0.000     41.000
7      2948280  2017-10-19         1.000     -2.000
8      2948225  2017-10-20         2.000     -7.000
9      2948280  2017-10-20         1.000     25.000
10     2948225  2017-10-21         0.000     13.000 <- changed to 13
11     2948280  2017-10-21         0.000     11.000
#if need check duplicates in all column data
df1 = df[(df['products_qty'] == 0) & (df['stock_qty'].duplicated(keep=False))]
print (df1)
    product_id          dt  products_qty  stock_qty
1      2948280  2017-10-16           0.0       11.0
2      2948225  2017-10-17           0.0       41.0
6      2948225  2017-10-19           0.0       41.0
10     2948225  2017-10-21           0.0       13.0 <- because dupe with first row
11     2948280  2017-10-21           0.0       11.0

#if need check only duplicates in rows with 0 in products_qty

df2 = (df[df.loc[df['products_qty'] == 0, 'stock_qty']
           .duplicated(keep=False).reindex(df.index, fill_value=False)])
print (df2)
    product_id          dt  products_qty  stock_qty
1      2948280  2017-10-16           0.0       11.0
2      2948225  2017-10-17           0.0       41.0
6      2948225  2017-10-19           0.0       41.0
11     2948280  2017-10-21           0.0       11.0

df2 = df[df['products_qty'] == 0]
df2 = df2[df2['stock_qty'].duplicated(keep=False)]
print (df2)
    product_id          dt  products_qty  stock_qty
1      2948280  2017-10-16           0.0       11.0
2      2948225  2017-10-17           0.0       41.0
6      2948225  2017-10-19           0.0       41.0
11     2948280  2017-10-21           0.0       11.0