我正在试图弄清楚如何根据每对唯一列(ip,useragent)的行数来计算,例如
d = pd.DataFrame({'ip': ['192.168.0.1', '192.168.0.1', '192.168.0.1', '192.168.0.2'], 'useragent': ['a', 'a', 'b', 'b']})
ip useragent
0 192.168.0.1 a
1 192.168.0.1 a
2 192.168.0.1 b
3 192.168.0.2 b
生产:
ip useragent
192.168.0.1 a 2
192.168.0.1 b 1
192.168.0.2 b 1
想法?
答案 0 :(得分:46)
如果你使用groupby,你会得到你想要的。
d.groupby(['ip', 'useragent']).size()
产生
ip useragent
192.168.0.1 a 2
b 1
192.168.0.2 b 1
答案 1 :(得分:3)
print(d.groupby(['ip', 'useragent']).size().reset_index().rename(columns={0:''}))
给出:
ip useragent
0 192.168.0.1 a 2
1 192.168.0.1 b 1
2 192.168.0.2 b 1
另一个不错的选择可能是pandas.crosstab:
print(pd.crosstab(d.ip, d.useragent) )
print('\nsome cosmetics:')
print(pd.crosstab(d.ip, d.useragent).reset_index().rename_axis('',axis='columns') )
给出:
useragent a b
ip
192.168.0.1 2 1
192.168.0.2 0 1
some cosmetics:
ip a b
0 192.168.0.1 2 1
1 192.168.0.2 0 1