我有一个如下数据框:
df = pd.DataFrame({'COND1' : [0,4,4,4,0],
'NAME' : ['one', 'one', 'two', 'three', 'three'],
'COND2' : ['a', 'b', 'a', 'a','b'],
'value': [30, 45, 18, 23, 77]})
我们有两个条件:[0,4]
和['a','b']
df
COND1 COND2 NAME value
0 0 a one 30
1 4 a one 45
2 4 b one 25
3 4 a two 18
4 4 a three 23
5 4 b three 77
如果我有信息,我想为每个名称选择一个条件为COND1=0 & COND2=a
的子集,否则,选择COND1=4 & COND2=b
。
结果数据框将为:
df
COND1 COND2 NAME value
0 0 a one 30
1 NaN Nan two NaN
2 4 b three 77
我尝试执行以下操作:
df[ ((df['COND1'] == 0 ) & (df['COND2'] == 'a') |
(df['COND1'] == 4 ) & (df['COND2'] == 'b'))]
答案 0 :(得分:0)
尝试通过使用drop_duplicates
(将NAME不满足任何条件的情况加回去)reindex
(删除NAME同时满足两个条件的情况)来修改结果
Newdf=df[ ((df['COND1'] == 0 ) & (df['COND2'] == 'a') | (df['COND1'] == 4 ) & (df['COND2'] == 'b'))]
Newdf.sort_values('COND1').drop_duplicates(['NAME']).set_index('NAME').reindex(df.NAME.unique()).reset_index()
Out[378]:
NAME COND1 COND2 value
0 one 0.0 a 30.0
1 two NaN NaN NaN
2 three 4.0 b 77.0
答案 1 :(得分:0)
这里是一个使用助手列的可扩展解决方案。这个想法是创建一个字典映射顺序,并将其应用于两个系列的组合。排序和删除重复项。
import numpy as np
df = pd.DataFrame({'COND1' : [0,4,4,4,4,4],
'NAME' : ['one', 'one', 'one', 'two', 'three', 'three'],
'COND2' : ['a', 'a', 'b', 'a', 'a','b'],
'value': [30, 45, 25, 18, 23, 77]})
# define order dictionary and apply to dataframe
order = {(0, 'a'): 0, (4, 'b'): 1}
df['order'] = df.set_index(['COND1', 'COND2']).index.map(order.get)
# if not found in dictionary, convert columns to NaN
df.loc[df['order'].isnull(), ['COND1', 'COND2', 'value']] = np.nan
# sort values, drop duplicates, drop helper column
res = df.sort_values('order').drop_duplicates(subset=['NAME']).drop('order', 1)
print(res)
COND1 NAME COND2 value
0 0.0 one a 30.0
5 4.0 three b 77.0
3 NaN two NaN NaN
答案 2 :(得分:0)
我认为这可行:
def conds_are(x,y):
return df['COND1'].eq(x) & df['COND2'].eq(y)
def name_in(f):
return df['NAME'].isin(df.loc[f,'NAME'].unique())
# Find rows matching conditions.
good = conds_are(0,'a')
good |= conds_are(4,'b') & ~name_in(good)
# Did we miss any names?
bad = ~name_in(good)
# Build DataFrame from surviving rows.
df1 = df.loc[good|bad].copy()
df1.loc[bad,df.columns.drop('NAME')] = np.nan
输出:
COND1 NAME COND2 value
0 0.0 one a 30.0
2 NaN two NaN NaN
4 4.0 three b 77.0
您实际上不需要定义这些功能,但是IMO使它们易于阅读。
注意:value
列是浮点的,因为ints are not nullable in pandas。