熊猫系列计算

时间:2020-02-25 09:16:20

标签: pandas pandas-groupby series

我的列(faq_helpful)的值为 0,1或空白。如果它是空白,我不会理会。我想查找那里有0和1,但是显然,两者返回相同的值是错误的。

for question, question_df in df_raw.groupby(['faq_question']): 
            count_0 = question_df['faq_helpful'].isin([0]).count()
            print(count_0)  # returns 25
            count_1 = question_df['faq_helpful'].isin([1]).count()
            print(count_1)   # also return 25 which is wrong
            total = count_0 + count_1

1 个答案:

答案 0 :(得分:0)

基于您的代码的解决方案:

count = []
for question, question_df in df_raw.groupby(['faq_question']):
    count.append(len(question_df['faq_question']))
print(count)
[6, 9] # numbers based on my example 

total = sum(count)
print(total)
15

Pandas具有内置功能,可以对pandas.Series.value_counts()here系列中的值进行计数。此功能将对系列中的所有值进行排序和计数。它返回一个序列,其中的索引指示计数的值(在您的情况下为0和1),并指示其出现的次数。

k = df_raw['faq_question'].value_counts()

print(k)
1.0    9
0.0    6
Name: faq_question, dtype: int64

total = sum(k)

用于生成上述示例的示例代码,在相关列中包括0/1 / nan:

df_raw = pd.DataFrame(np.array([
    [0,0,1,1,0,1,float("NaN"),0,1,1,1,1,0,0,float("NaN"),1,1],
    [9,4,float("NaN"),4,9,4,0,5,2,2,5,5,2,5,8,float("NaN"),1]]).T, 
    columns=["faq_question","other"])