我将此作为csv工作在pandas-前十行
print frame1.head(10)
alert Subject filetype type country status
0 33965790 44676 aba Attachment doc RU,RU,RU,RU deleted
1 33965786 44676 rcrump Attachment zip NaN deleted
2 33965771 3aba Attachment zip NaN deleted
3 33965770 NaN Attachment js ,, deleted
4 33965766 NaN Attachment js ,, deleted
5 33965761 NaN Attachment zip NaN deleted
6 33965760 NaN Attachment zip NaN deleted
7 33965757 NaN Attachment zip NaN deleted
8 33965751 35200 3aba Attachment doc RU,RU,RU deleted
9 33965747 35200 INVaba Attachment zip NaN deleted
我需要获取主题列并计算所有以'aba'作为子字符串的行。
Occurrences of aba- 512
甚至是这样的结果
aba 12
3aba 5
INVaba 2
这是我的代码 -
targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject')
print (targeted.to_string(header=False))
获取错误 - AttributeError:无法访问“DataFrameGroupBy”对象的可调用属性“to_string”,请尝试使用“apply”方法
*****注意:我之前使用这个来计算不同的文件类型,这有效 -
filetype = frame1.groupby('filetype').size()
###clean up the printing
print "Delivered in Email"
print (filetype.to_string(header=False))
并给我 -
Delivered in Email
Attachment 32647
Header 131
URL 9236
答案 0 :(得分:2)
要获得完整计数,只需使用str.contains
,然后使用count
。
>>> df.Subject.str.contains('aba', case=False, na=False).count()
10
然后,要获取包含'aba'
的唯一字符串的计数,您可以访问contains
找到的值,然后使用value_counts
。
>>> df.loc[df.Subject.str.contains('aba', case=False, na=False), 'Subject'].value_counts()
3aba 1
INVaba 1
aba 1
Name: Subject, dtype: int64
答案 1 :(得分:0)
对于您建议的第一个输出,您可以执行以下操作:
containts_aba = frame1[frame1['Subject'].str.contains('aba', case=False)
print("Occurrences of aba-",len(contains_aba))
它根据您的条件创建另一个数据帧,然后该数据帧的长度将是出现次数,因此您可以打印它。
答案 2 :(得分:0)
targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject').size()
print (targeted.to_string(header=False))
给予
3aba 1
INVaba 1
aba 1