首先假设我们在下面有一个数据框:
https://[random].app.goo.gl
我想做的是找到匹配行,然后进行一些计算。
import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'],
'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'C':['10','10','10','30','50','60','50','8'],
'D':['9','8','7','6','5','4','3','2']})
print(data)
A C D id
0 foo 10 9 1
1 bar 10 8 2
2 foo 10 7 3
3 bar 30 6 4
4 foo 50 5 5
5 bar 60 4 6
6 foo 50 3 7
7 foo 8 2 8
,然后生成一个包含三列for any two ids(idx, idy) in data.iterrows():
if idx.A == idy.A and idx.C = idy.C:
result = idx.D * idy.D
,['id']
和['A']
的新数据框。
因此,预期结果的几行是:
['result']
我尝试过,但是结果是错误的逻辑或错误的代码/数据格式。 有人可以帮我吗?
答案 0 :(得分:3)
一种方法是对A + C进行分组,计算产品并计数,过滤掉组中只有单个项目的产品,然后在A + C上内部合并回原始框架,例如:
df.merge(
df.groupby(['A', 'C']).D.agg(['prod', 'count'])
[lambda r: r['count'] > 1],
left_on=['A', 'C'],
right_index=True
)
给你:
A C D id prod count
0 foo 10 9 1 63 2
2 foo 10 7 3 63 2
4 foo 50 5 5 15 2
6 foo 50 3 7 15 2
然后适当地拖放/重命名列。
答案 1 :(得分:1)
您可以使用自连接技术:
data[['id', 'C', 'D']] = data[['id', 'C', 'D']].apply(pd.to_numeric)
joint = pd.merge(data, data, on=('A', 'C'))
joint = joint.loc[join['id_x'] != join['id_y']]
joint['result'] = joint['D_x'] * joint['D_y']
result = joint[['id_x', 'A', 'result']]
result.columns = ['id', 'A', 'result']
结果:
id A result
1 1 foo 63
2 3 foo 63
7 5 foo 15
8 7 foo 15
答案 2 :(得分:0)
import pandas as pd
data = pd.DataFrame({'id':['1','2','3','4','5','6','7','8'],
'A':['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'C':['10','10','10','30','50','60','50','8'],
'D':['9','8','7','6','5','4','3','2']})
首先将相关列转换为数字
data[['C', 'D', 'id']] = data[['C', 'D', 'id']].apply(pd.to_numeric)
创建一个空的DataFrame追加到
finalDataFrame = pd.DataFrame()
groupby
两列,然后在组中找到列D
的乘积并将其附加。
group = data.groupby(['A', 'C'])
for x, y in group:
product = (y[["D"]].product(axis=0).values[0])
for row in y.index:
y.at[row, 'D'] = product
finalDataFrame = finalDataFrame.append(y, ignore_index=True)
output = finalDataFrame[['id', 'A', 'D']]
output = output.rename(columns = {'D': 'result'})
print(output)
给你
id A result
0 2 bar 8
1 4 bar 6
2 6 bar 4
3 8 foo 2
4 1 foo 63
5 3 foo 63
6 5 foo 15
7 7 foo 15