我有一个数据框
df = pd.DataFrame({'id':['one','one','two','two','three','three','three'],
'type':['current','saving','current','current','current','saving','credit']})
我想计算只有'当前'的id数 应该是这样的:
only_currnt_id_list = ['two']
答案 0 :(得分:2)
我认为你需要:
L = df.groupby('id') \
.filter(lambda x: (x['type'] == 'current').all() and
(x['type'] == 'current').sum() == 1)['id'].tolist()
print (L)
['two']
编辑:
df = pd.DataFrame({'id':['one','one','two','three','three','three'],'type':['current','current','current','current','saving','credit']})
print (df)
id type
0 one current
1 one current
2 two current
3 three current
4 three saving
5 three credit
L = df.groupby('id') \
.filter(lambda x: (x['type'] == 'current').all() and
(x['type'] == 'current').sum() == 1)['id'].tolist()
print (L)
['two']
L = df.groupby('id') \
.filter(lambda x: (x['type'] == 'current').all())['id'].unique().tolist()
print (L)
['one', 'two']
答案 1 :(得分:1)
使用pd.crosstab
df=pd.crosstab(df.id,df.type)
df.loc[df.sum(1)==df.current,].index.values[0]
Out[1065]: 'two'
或者您可以使用groupby
和nunique
df['unique']=df.groupby('id')['type'].transform('nunique')
df.loc[(df.unique==1)&(df.type=='current'),:].id.unique().tolist()
Out[1085]: ['two']
答案 2 :(得分:0)
不使用纯Pandas,但您可以使用set
所有ID和ID之间的type != 'current'
差异:
>>> set(df["id"]) - set(df["id"][df["type"] != "current"])
{2}