我有一个与此类似的数据框
GRP HOST1 HOST2 HOST3 FILESIZE
0 0 srv39 srv45 srv47 203498176
1 1 srv102 srv36 srv38 452763956
2 1 srv101 srv36 srv45 453277268
3 1 srv101 srv34 srv45 448174741
4 1 srv36 srv49 srv50 452728577
5 2 srv100 srv47 srv48 454617541
6 2 srv100 srv45 srv49 454617541
7 2 srv38 srv49 srv47 454617541
现在我想要实现的是计算我在GROST列分组的HOST1 HOST2和HOST3列中出现的所有事件,如下所示
--
GRP HOST count
1 srv101 2
srv36 3
如果我能够将FILESIZE列的值相加,那将是完美的。
我试图使用我发现here的建议来确定解决方案,但我无法按GRP分组计数。
有关哪种方法可以获得大熊猫需要的最佳方法?
答案 0 :(得分:3)
df = (df.melt(id_vars='GRP', value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
.groupby(['GRP', 'HOST'])
.size()
.reset_index(name='count'))
print (df)
GRP HOST count
0 0 srv39 1
1 0 srv45 1
2 0 srv47 1
3 1 srv101 2
4 1 srv102 1
5 1 srv34 1
6 1 srv36 3
7 1 srv38 1
8 1 srv45 2
9 1 srv49 1
10 1 srv50 1
11 2 srv100 2
12 2 srv38 1
13 2 srv45 1
14 2 srv47 2
15 2 srv48 1
16 2 srv49 2
如果希望sum
列的FILESIZE
使用agg
:
df1 = (df.melt(id_vars=['GRP', 'FILESIZE'], value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
.groupby(['GRP', 'HOST'])['FILESIZE']
.agg(['size','sum'])
.reset_index()
)
print (df1)
GRP HOST size sum
0 0 srv39 1 203498176
1 0 srv45 1 203498176
2 0 srv47 1 203498176
3 1 srv101 2 901452009
4 1 srv102 1 452763956
5 1 srv34 1 448174741
6 1 srv36 3 1358769801
7 1 srv38 1 452763956
8 1 srv45 2 901452009
9 1 srv49 1 452728577
10 1 srv50 1 452728577
11 2 srv100 2 909235082
12 2 srv38 1 454617541
13 2 srv45 1 454617541
14 2 srv47 2 909235082
15 2 srv48 1 454617541
16 2 srv49 2 909235082
答案 1 :(得分:2)
您可以使用Unable to locate an element with the xpath expression //div[@class='label series smaller' | @class='label series smaller hover']/span[text()='Jul-14' because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[@class='label series smaller' | @class='label series smaller hover']/span[text()='Jul-14'' is not a valid XPath expression.
,然后关注stack
和groupby
size
如果你需要总和
s=df.set_index('GRP')[['HOST1','HOST2','HOST3']].stack().to_frame('HOST')
s.groupby([s.index.get_level_values(level=0),s.HOST]).size()
Out[229]:
GRP HOST
0 srv39 1
srv45 1
srv47 1
1 srv101 2
srv102 1
srv34 1
srv36 3
srv38 1
srv45 2
srv49 1
srv50 1
2 srv100 2
srv38 1
srv45 1
srv47 2
srv48 1
srv49 2
dtype: int64