我的列(faq_helpful)的值为 0,1或空白。如果它是空白,我不会理会。我想查找那里有0和1,但是显然,两者返回相同的值是错误的。
for question, question_df in df_raw.groupby(['faq_question']):
count_0 = question_df['faq_helpful'].isin([0]).count()
print(count_0) # returns 25
count_1 = question_df['faq_helpful'].isin([1]).count()
print(count_1) # also return 25 which is wrong
total = count_0 + count_1
答案 0 :(得分:0)
基于您的代码的解决方案:
count = []
for question, question_df in df_raw.groupby(['faq_question']):
count.append(len(question_df['faq_question']))
print(count)
[6, 9] # numbers based on my example
total = sum(count)
print(total)
15
Pandas具有内置功能,可以对pandas.Series.value_counts()
,here系列中的值进行计数。此功能将对系列中的所有值进行排序和计数。它返回一个序列,其中的索引指示计数的值(在您的情况下为0和1),并指示其出现的次数。
k = df_raw['faq_question'].value_counts()
print(k)
1.0 9
0.0 6
Name: faq_question, dtype: int64
total = sum(k)
用于生成上述示例的示例代码,在相关列中包括0/1 / nan:
df_raw = pd.DataFrame(np.array([
[0,0,1,1,0,1,float("NaN"),0,1,1,1,1,0,0,float("NaN"),1,1],
[9,4,float("NaN"),4,9,4,0,5,2,2,5,5,2,5,8,float("NaN"),1]]).T,
columns=["faq_question","other"])