如何使用pandas获取数据框中具有特定值的列数?

时间:2017-03-20 06:54:11

标签: python pandas dataframe

我有两列。

Sales   Close_Date
0       04/01/12
0   
33496   12/01/12
588     05/01/12
9240    10/01/12

如何找出“0”或“9296”的数量或“销售”栏中的任何其他值?

2 个答案:

答案 0 :(得分:1)

如果需要计算一个值,则最简单的是布尔掩码的总和True值:

print (df.Sales == 0)
0     True
1     True
2    False
3    False
4    False
Name: Sales, dtype: bool


a = (df.Sales == 0).sum()
print (a)
2

如果需要计算所有值需要groupby并汇总size或使用value_counts

df = df.groupby('Sales').size()
print (df)
Sales
0        2
588      1
9240     1
33496    1
dtype: int64

或者:

df = df['Sales'].value_counts()
print (df)
0        2
9240     1
588      1
33496    1
Name: Sales, dtype: int64

如果需要过滤器,请使用queryboolean indexing

df = df.query('Sales == 0')
print (df)
   Sales Close_Date
0      0   04/01/12
1      0        NaN

或者:

df = df[df.Sales == 0]
print (df)
   Sales Close_Date
0      0   04/01/12
1      0        NaN

<强>计时

#[500000 rows x 2 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
print (df)

In [37]: %timeit ((df.Sales == 0).sum())
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 4.62 ms per loop

In [38]: %timeit (Counter(df.Sales)[0])
10 loops, best of 3: 82.4 ms per loop

但这可以更快:

a = (df.Sales.value == 0).sum()

答案 1 :(得分:1)

from collections import Counter

c = Counter(df.Sales)
c[0]

2