我想计算每个bin中pandas DataFrame行的数量,并列出计数。
我认为应该有比我更快的方法。你能给我一些建议吗?
script.py
import pandas
binwidth = 10
data = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], header=None, comment='#')
mylist = []
for item in data.iterrows():
index = item[1]['time']/binwidth
if len(mylist) <= index:
mylist.append(1)
else:
mylist[index] += 1
print mylist # which outputs [8, 4, 4]
sample.csv
# time value
1 a
2 b
3 c
4 d
6 e
7 f
8 g
9 h
10 i
12 j
15 k
17 l
21 m
22 n
26 o
29 p
答案 0 :(得分:2)
您可以使用pandas.cut
import pandas
binwidth = 10
data = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'], header=None, comment='#')
max_bin_edge = int(np.ceil(data['time'].max()/binwidth)*binwidth) + 1
bin_edges = list(range(0, max_bin_edge, binwidth))
bins = pd.cut(data['time'], bins=bin_edges, right=False)
bin_counts = bins.groupby(bins).count()
print(bin_counts)
这也将为您提供bin边缘
time
[0, 10) 8
[10, 20) 4
[20, 30) 4
Name: time, dtype: int64
答案 1 :(得分:0)
我想这可以胜任:
# set the time column as index for the groupby function
df = pandas.read_csv('sample.csv', sep=' ', names=['time', 'value'],
header=None, comment='#', index_col=['time'])
binwidth = 10
groupped_df = df.groupby(lambda x: int(x/binwidth)).count()
mylist = groupped_df['value'].tolist()
答案 2 :(得分:0)
使用
In [1086]: df.groupby(df.time//10).time.count().values.tolist()
Out[1086]: [8L, 4L, 4L]
或者,
In [1092]: df.groupby(df.time//10).size().tolist()
Out[1092]: [8L, 4L, 4L]
或者,Numpy版
In [1096]: np.bincount(df.time//10).tolist()
Out[1096]: [8L, 4L, 4L]
详细
In [1087]: df
Out[1087]:
time value
0 1 a
1 2 b
2 3 c
3 4 d
4 6 e
5 7 f
6 8 g
7 9 h
8 10 i
9 12 j
10 15 k
11 17 l
12 21 m
13 22 n
14 26 o
15 29 p