我正在尝试重新学习python并使用它来帮助过滤和组织数据。我是熊猫的新手,并遇到了以下问题。我有一个传感器,可以测量坠落的物体直径和速度。此数据将使用以下格式保存到csv文件中:
Date,Time,Diameter,Velocity,BaseV
21-Sep-2013,13:51:04,0.422705,0.850142,4.880371
21-Sep-2013,14:01:37,0.505481,1.499196,4.877930
21-Sep-2013,14:18:50,0.391306,1.795166,4.880371
21-Sep-2013,14:18:50,0.407307,1.149977,4.880371
21-Sep-2013,14:18:50,0.399387,2.098552,4.880371
当物体落下的日期和时间,直径和速度就是那个,而baseVoltage是我们用于校准的值。 该仪器在亚秒级别测量,我使用熊猫将数据重新采样为5分钟间隔,而不是使用时间值的模数除法。在浏览了熊猫的食谱后,我提出了以下代码:
# Python script to open eachDrop.dat and read values into pandas.dataframe
import math as m
import numpy as np
import pandas as pd
#---------------------------------------------------------------------------
#read csv values into panda data frame
dropData=pd.read_csv('resEachDrop[RD130921.dat].txt',sep=',',header=0,index_col=0,parse_dates=[[0,1]],encoding=None,tupleize_cols=False, infer_datetime_format=True)
#---------------------------------------------------------------------------
#resample time series to 5min intervals for Count, Mean, Min and Max
#mmmsc/s is group of np functions to apply to dropData diameter column to return aggregated columns
mmmsc={'Mean':np.mean, 'Max':np.max, 'Min':np.min, 'Sum':np.sum,'Count':'count'}
mmms={'Mean':np.mean, 'Max':np.max, 'Min':np.min, 'Sum':np.sum}
#resample dropData at 5min increment on Diameter column using mhc
newData=dropData.resample('5Min', how={'Diameter':mmmsc,'Velocity':mmms})
print newData
#--------------------------------------------------------------------------
终端窗口的输出如下所示(我删除了一些行以节省空间):
Date_Time Diameter Velocity BaseV
2013-09-21 13:51:04 0.422705 0.850142 4.880371
2013-09-21 14:01:37 0.505481 1.499196 4.877930
2013-09-21 14:18:50 0.391306 1.795166 4.880371
2013-09-21 14:18:50 0.407307 1.149977 4.880371
... ... ... ...
2013-09-21 23:59:54 0.470808 0.719216 4.216309
2013-09-21 23:59:54 0.529965 1.748123 4.216309
2013-09-21 23:59:55 0.563966 1.466564 4.213867
2013-09-21 23:59:55 0.563966 1.515517 4.213867
[53740 rows x 3 columns]
Diameter
Date_Time Count Max Sum Min Mean
2013-09-21 13:50:00 1 0.422705 0.422705 0.422705 0.422705
2013-09-21 13:55:00 0 NaN NaN NaN NaN
2013-09-21 14:00:00 1 0.505481 0.505481 0.505481 0.505481
2013-09-21 14:05:00 0 NaN NaN NaN NaN
2013-09-21 14:10:00 0 NaN NaN NaN NaN
2013-09-21 14:15:00 3 0.407307 1.198000 0.391306 0.399333
... ... ... ... ... ...
2013-09-21 21:30:00 1068 3.614623 594.918064 0.385087 0.557039
2013-09-21 21:35:00 247 4.363684 136.175383 0.384975 0.551317
2013-09-21 21:40:00 176 1.284766 92.519502 0.393808 0.525679
2013-09-21 21:45:00 147 1.642836 79.037770 0.385874 0.537672
Velocity
Max Sum Min Mean
Date_Time
2013-09-21 13:50:00 0.850142 0.850142 0.850142 0.850142
2013-09-21 13:55:00 NaN NaN NaN NaN
2013-09-21 14:00:00 1.499196 1.499196 1.499196 1.499196
2013-09-21 14:05:00 NaN NaN NaN NaN
2013-09-21 14:10:00 NaN NaN NaN NaN
2013-09-21 14:15:00 2.098552 5.043695 1.149977 1.681232
... ... ... ... ...
2013-09-21 21:30:00 3.040620 1589.967392 0.433960 1.488734
2013-09-21 21:35:00 3.215267 376.540780 0.425394 1.524457
2013-09-21 21:40:00 2.362207 272.548852 0.529707 1.548573
2013-09-21 21:45:00 2.285334 228.478854 0.503430 1.554278
当比较直径的总和值与由处理数据的程序计算的总和值时,我有一个巨大的错误。在搜索论坛之后,我认为这是由于numpy.sum占用了行的总和而不是类似于此问题的列: numpy.sum behaves differently on numpy.array vs pandas.DataFrame 。 我试图调整Sum':np.sum,使用axis = 0类似于此线程中的解决方案,但是我收到以下错误:
Traceback (most recent call last): File "dropRead.py", line 12, in <module>
mmmsc={'Mean':np.mean, 'Max':np.max, 'Min':np.min, 'Sum':np.sum(axis=0),
'Count':'count'}
TypeError: sum() takes at least 1 argument (1 given)
任何人都可以了解我能做些什么才能使色谱柱正确相加?
谢谢,
肖恩
答案 0 :(得分:0)
使用EdChum的建议解决了我的问题:
pd.Series.sum
而不是:
np.sum