根据复杂条件在pd.df中查找行

时间:2018-07-04 07:07:03

标签: pandas numpy python-3.6

我的df看起来像这样:

                    code       date type  strike  settlement
id                                                          
1195001   CBT_21_G2012_S 2012-01-04    P  101.50    0.015625
1195093   CBT_21_G2012_S 2012-01-04    C  101.50   28.890625
1194926   CBT_21_G2012_S 2012-01-04    C  102.00   28.390625
1194944   CBT_21_G2012_S 2012-01-04    C  102.50   27.906250
1195109   CBT_21_G2012_S 2012-01-04    P  102.50    0.015625
1194905   CBT_21_G2012_S 2012-01-04    C  103.00   27.406250
1195008   CBT_21_G2012_S 2012-01-04    P  103.50    0.015625
1195123   CBT_21_G2012_S 2012-01-04    C  103.50   26.906250
1194908   CBT_21_G2012_S 2012-01-04    C  104.00   26.390625
1194980   CBT_21_G2012_S 2012-01-04    C  104.50   25.890625
1195025   CBT_21_G2012_S 2012-01-04    P  104.50    0.015625
1194981   CBT_21_G2012_S 2012-01-04    P  105.00    0.015625
1195063   CBT_21_G2012_S 2012-01-04    C  105.00   25.390625
1194960   CBT_21_G2012_S 2012-01-04    C  105.50   24.890625
1195102   CBT_21_G2012_S 2012-01-04    P  105.50    0.015625
1194989   CBT_21_G2012_S 2012-01-04    C  106.00   24.390625

我需要找到对于相同的代码,日期和警告仅存在type =='P'或type =='C'的行。

所需的输出应为:

                    code       date type  strike  settlement
id                                                          
1194926   CBT_21_G2012_S 2012-01-04    C  102.00   28.390625
1194905   CBT_21_G2012_S 2012-01-04    C  103.00   27.406250
1194908   CBT_21_G2012_S 2012-01-04    C  104.00   26.390625
1194989   CBT_21_G2012_S 2012-01-04    C  106.00   24.390625

[编辑] 另外,如何在生成的df中翻转“类型”“ C”和“ P”(用“ P”替换“ C”,用“ C”替换“ P”)?

任何帮助都会受到赞赏。

谢谢。

1 个答案:

答案 0 :(得分:1)

transformnunique一起使用,并按1eq==)进行比较,最后按boolean indexing进行过滤:

#if exist multiple types
#df = df[df['type'].isin(['C','P'])]

df = df[df.groupby(['code', 'date', 'strike'])['type'].transform('nunique').eq(1)]
print (df)
                   code        date type  strike  settlement
id                                                          
1194926  CBT_21_G2012_S  2012-01-04    C   102.0   28.390625
1194905  CBT_21_G2012_S  2012-01-04    C   103.0   27.406250
1194908  CBT_21_G2012_S  2012-01-04    C   104.0   26.390625
1194989  CBT_21_G2012_S  2012-01-04    C   106.0   24.390625

详细信息

print (df.groupby(['code', 'date', 'strike'])['type'].transform('nunique'))
id
1195001    2
1195093    2
1194926    1
1194944    2
1195109    2
1194905    1
1195008    2
1195123    2
1194908    1
1194980    2
1195025    2
1194981    2
1195063    2
1194960    2
1195102    2
1194989    1
Name: type, dtype: int64

编辑:对于交换值,请按字典使用map

df['type'] = df['type'].map({'C':'P', 'P':'C'})
print (df)
                   code        date type  strike  settlement
id                                                          
1194926  CBT_21_G2012_S  2012-01-04    P   102.0   28.390625
1194905  CBT_21_G2012_S  2012-01-04    P   103.0   27.406250
1194908  CBT_21_G2012_S  2012-01-04    P   104.0   26.390625
1194989  CBT_21_G2012_S  2012-01-04    P   106.0   24.390625