找到两列之间相同行的数量

时间:2017-12-11 18:02:06

标签: python pandas

我有一张如下表:

             check_churn  is_churn
0               True         1
1               True         1
2              False         1
3              False         1
4               True         1
5               True         1
6               True         1
7               True         1
8               True         1
9               True         1
10              True         1

我想确定有多少行相同。例如,行0从True = 1开始计算,也与第1行相同。因此答案将是9,因为只有2个不匹配。

2 个答案:

答案 0 :(得分:1)

我认为您需要比较列的值和按True计算sumTrue s是1 s之类的流程:

print ((df['check_churn'] == df['is_churn']))
0      True
1      True
2     False
3     False
4      True
5      True
6      True
7      True
8      True
9      True
10     True
dtype: bool


print ((df['check_churn'] == df['is_churn']).sum())
9

另一个解决方案是过滤并获取DataFrame.shape

print (df_train.loc[df_train.check_churn == df_train.is_churn].shape[0])
9

<强>计时

np.random.seed(2017)
N = 10000
df = pd.DataFrame({'check_churn':np.random.choice([True, False], size=N),
                   'is_churn':np.random.choice([0, 1], size=N)})
print (df)

In [35]: %timeit (df['check_churn'] == df['is_churn']).sum()
1000 loops, best of 3: 414 µs per loop

In [36]: %timeit sum(df['check_churn'] & df['is_churn'])
1000 loops, best of 3: 793 µs per loop

In [37]: %timeit (df.loc[df.check_churn == df.is_churn].shape[0])
1000 loops, best of 3: 708 µs per loop
N = 1000000

In [39]: %timeit (df['check_churn'] == df['is_churn']).sum()
100 loops, best of 3: 18.2 ms per loop

In [40]: %timeit sum(df['check_churn'] & df['is_churn'])
10 loops, best of 3: 54.7 ms per loop

In [41]: %timeit (df.loc[df.check_churn == df.is_churn].shape[0])
10 loops, best of 3: 23.4 ms per loop

In [42]: %timeit (df['check_churn'] & df['is_churn']).sum()
10 loops, best of 3: 21.2 ms per loop

答案 1 :(得分:1)

我可能会这样写:

sum(df['check_churn'] & df['is_churn'])

完整示例:

import pandas as pd

data = {'check_churn': {0: True,
  1: True,
  2: False,
  3: False,
  4: True,
  5: True,
  6: True,
  7: True,
  8: True,
  9: True,
  10: True},
 'is_churn': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 1,
  5: 1,
  6: 1,
  7: 1,
  8: 1,
  9: 1,
  10: 1}}

df = pd.DataFrame(data)

sum(df['check_churn'] & df['is_churn'])