我的数据框如下:
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
我想删除df ['Qty']彼此净值对的df ['Type'] = 101和102的对值。最终结果将是这样:
df = pd.DataFrame({'User':['a','b'],
'Type':['101', '101'],
'Qty':[10, 30})
我试图将负值转换为绝对数,并删除重复项,例如:
df['Qty'] = df['Qty'].abs()
df.drop_duplicates(subset=['Qty'], keep='first')
但是随后错误地给了我这样的数据帧:
df = pd.DataFrame({'User':['a','b', 'b'],
'Type':['101', '101', '101'],
'Qty':[10, 30, 5})
答案 0 :(得分:3)
想法是创建每组索引值的组合,并测试每个子组是否同时包含Type
和总和为0
,以用于此匹配对:
#solution need unique index values
df = df.reset_index(drop=True)
from itertools import combinations
out = set()
def f(x):
for i in combinations(x.index, 2):
a = x.loc[list(i)]
if (set(a['Type']) == set(['101','102'])) and (a['Qty'].sum() == 0):
out.add(i)
df.groupby('User').apply(f)
print (out)
{(0, 1), (4, 5), (1, 2)}
如果有重复值,则删除所有对,例如(1,2)
:
s = pd.Series(list(out)).explode()
idx = s.index[s.duplicated()]
final = s.drop(idx)
print (final)
0 0
0 1
1 4
1 5
dtype: object
最后删除原始行:
df = df.drop(final)
print (df)
User Type Qty
2 a 101 10
3 b 101 30
答案 1 :(得分:2)
如果只有两个'Type'
(在这种情况下为101
和102
) ,则可以编写一个自定义功能如下:
'Qty'
的键构建字典。'Type'
相对应的'Qty'
个值的列表。from collections import defaultdict
def f(x):
new = defaultdict(list)
for k,v in x[['Type', 'Qty']].itertuples(index=None,name=None):
if not new[abs(v)]:
new[abs(v)].append(k)
elif new[abs(v)][-1] !=k:
new[abs(v)].pop()
else:
new[abs(v)].append(k)
return pd.Series(new,name='Qty').rename_axis(index='Type')
逻辑很简单:
'Type'
添加到列表中。'Type'
是否等于当前的'Type'
值。例如,如果它们都不匹配,则new = {10:['101']}
且当前密钥为'102'
,请删除'101'
。因此,new = {10:[]}
'Type'
和当前'Type'
匹配,只需将当前'Type'
附加到列表中,例如,如果new = {10:['101']}
和当前{{ 1}}是'Type'
,然后附加到其上。因此,'101'
。new = {10:['101', '101']}
答案 2 :(得分:2)
遍历所有记录并将匹配项保存在列表中,以确保没有一个以上的索引可以配对。
import pandas as pd
df = pd.DataFrame({'User':['a','a','a','b','b','b'],
'Type':['101','102','101','101','101','102'],
'Qty':[10, -10, 10, 30, 5, -5]})
# create a list to collect all indices that we are going to remove
records_to_remove = []
# a dictionary to map which group mirrors the other
pair = {'101': '102', '102':'101'}
# let's go over each row one by one,
for i in df.index:
current_record = df.iloc[i]
# if we haven't stored this index already for removal
if i not in records_to_remove:
pair_type = pair[current_record['Type']]
pair_quantity = -1*current_record['Qty']
# search for all possible matches to this row
match_records = df[(df['Type']==pair_type) & (df['Qty']==pair_quantity)]
if match_records.empty:
# if no matches fond move on to the next row
continue
else:
# if a match is found, take the first of such records
first_match_index = match_records.index[0]
if first_match_index not in records_to_remove:
# store the indices in the list to remove only if they're not already present
records_to_remove.append(i)
records_to_remove.append(first_match_index)
df = df.drop(records_to_remove)
输出:
User Type Qty
2 a 101 10
3 b 101 30
看看这是否适合您!