熊猫-在两列中查找具有匹配值的行,并在另一列中乘以值

时间:2018-08-12 11:20:33

标签: python pandas loops

首先假设我们在下面有一个数据框:

https://[random].app.goo.gl

我想做的是找到匹配行,然后进行一些计算。

import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 
                     'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  
                     'C':['10','10','10','30','50','60','50','8'], 
                     'D':['9','8','7','6','5','4','3','2']})
print(data)

    A   C   D   id
0   foo 10  9   1
1   bar 10  8   2
2   foo 10  7   3
3   bar 30  6   4
4   foo 50  5   5
5   bar 60  4   6
6   foo 50  3   7
7   foo 8   2   8

,然后生成一个包含三列for any two ids(idx, idy) in data.iterrows(): if idx.A == idy.A and idx.C = idy.C: result = idx.D * idy.D ['id']['A']的新数据框。

因此,预期结果的几行是:

['result']

我尝试过,但是结果是错误的逻辑或错误的代码/数据格式。 有人可以帮我吗?

3 个答案:

答案 0 :(得分:3)

一种方法是对A + C进行分组,计算产品并计数,过滤掉组中只有单个项目的产品,然后在A + C上内部合并回原始框架,例如:

df.merge(
    df.groupby(['A', 'C']).D.agg(['prod', 'count'])
    [lambda r: r['count'] > 1],
    left_on=['A', 'C'],
    right_index=True
)

给你:

     A   C  D  id  prod  count
0  foo  10  9   1    63      2
2  foo  10  7   3    63      2
4  foo  50  5   5    15      2
6  foo  50  3   7    15      2

然后适当地拖放/重命名列。

答案 1 :(得分:1)

您可以使用自连接技术:

data[['id', 'C', 'D']] = data[['id', 'C', 'D']].apply(pd.to_numeric)
joint = pd.merge(data, data, on=('A', 'C'))
joint = joint.loc[join['id_x'] != join['id_y']]
joint['result'] = joint['D_x'] * joint['D_y']
result = joint[['id_x', 'A', 'result']]
result.columns = ['id', 'A', 'result']

结果

   id    A  result
1   1  foo      63
2   3  foo      63
7   5  foo      15
8   7  foo      15

答案 2 :(得分:0)

import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'], 
                     'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],  
                     'C':['10','10','10','30','50','60','50','8'], 
                     'D':['9','8','7','6','5','4','3','2']})

首先将相关列转换为数字

data[['C', 'D', 'id']] = data[['C', 'D', 'id']].apply(pd.to_numeric)

创建一个空的DataFrame追加到

finalDataFrame = pd.DataFrame()

groupby两列,然后在组中找到列D的乘积并将其附加。

group = data.groupby(['A', 'C'])
for x, y in group:


    product = (y[["D"]].product(axis=0).values[0])


    for row in y.index:
        y.at[row, 'D'] = product

    finalDataFrame = finalDataFrame.append(y, ignore_index=True)

output = finalDataFrame[['id', 'A', 'D']]
output = output.rename(columns = {'D': 'result'})
print(output)

给你

   id    A  result
0   2  bar       8
1   4  bar       6
2   6  bar       4
3   8  foo       2
4   1  foo      63
5   3  foo      63
6   5  foo      15
7   7  foo      15