我有一个pandas DataFrame导入2列(时间,心率)。
时间是格式MM:SS.s(分钟:Seconds.miliseconds)。我试图将这个时间转换成一个秒的浮点数(例如0.6s或65.3s)(以后用于折叠成10s窗口)。例如:
import pandas as pd
hr_raw = pd.read_csv('hr_data.csv')
hr_raw.dropna(inplace=True)
print(hr_raw.head())
Time HR bpm
0 00:00.6 97.0
1 00:01.0 92.0
2 00:01.3 80.0
3 00:01.6 81.0
4 00:02.0 80.0
以前(使用标准CSV模块导入时)我只是将此字符串拆分,转换为浮点数并进行数学计算以将其转换为秒:
with open('hr_data.csv', 'rU') as infile:
hr_data = list(csv.DictReader(infile, delimiter=','))
for row in hr_data:
temp = row['Time']
time.append(float(temp[3:7]) + (float(temp[0:2]) * 60))
现在我正在使用熊猫,但代码不能正常工作。我试图修改,以便我访问“时间”#39;专栏(见下文),但没有太多运气。
import pandas as pd
win_size = 10 # user defined window in seconds
hr_raw = pd.read_csv('hr_data.csv')
hr_raw.dropna(inplace=True) #remove NaN artifact from import
#### problem code ####
for row in hr_raw.Time:
hr_raw.Time[row] = float(hr_raw.Time[row][3:]) + float((hr_raw.Time[row][0:2] * 60))
# set time as index
hr_raw.set_index('Time', inplace=True)
# bin data based on user defined window
hr_bin = hr_raw.groupby((hr_raw.index // win_size + 1) * win_size).mean()
出现的错误是:
Traceback (most recent call last):
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
File "pandas\_libs\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:14010)
TypeError: an integer is required
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\mitbl001\Dropbox\CPET_python\import_hr_csv.py", line 11, in <module>
hr_raw.Time[row] = float(hr_raw.Time[row][3:]) + float((hr_raw.Time[row][0:2] * 60))
File "C:\Users\mitbl001\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "C:\Users\mitbl001\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4404)
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4087)
File "pandas\_libs\index.pyx", line 156, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5210)
KeyError: '00:00.6'
答案 0 :(得分:1)
我认为您需要indexing with str astype
加注float
:
hr_raw.Time = hr_raw.Time.str[3:].astype(float) + hr_raw.Time.str[0:2].astype(float) * 60
print (hr_raw)
Time HR bpm
0 0.6 97.0
1 1.0 92.0
2 1.3 80.0
3 1.6 81.0
4 2.0 80.0
另一个解决方案是转换to_timedelta
,但在从radd
右侧添加hour
之前:
hr_raw.Time = pd.to_timedelta(hr_raw.Time.radd('00:')).dt.total_seconds()
print (hr_raw)
Time HR bpm
0 0.6 97.0
1 1.0 92.0
2 1.3 80.0
3 1.6 81.0
4 2.0 80.0
然后不需要set_index,请使用列Time
:
# bin data based on user defined window
hr_bin = hr_raw.groupby((hr_raw.Time // win_size + 1) * win_size).mean()
print (hr_bin)
Time HR bpm
Time
10.0 1.3 86.0
答案 1 :(得分:1)
使用pd.to_timedelta
:
df['Time'] = pd.to_timedelta('00:' + df.Time).dt.total_seconds()
df
Time HR bpm
0 0.6 97.0
1 1.0 92.0
2 1.3 80.0
3 1.6 81.0
4 2.0 80.0
groupby
现在应该很简单,使用语法:
df.groupby(df.Time // x * x)
x
是您所需的时间窗口。这是一个以0.5秒的间隔分组并取心率平均值的例子:
df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean()
Time
0.5 97.0
1.0 86.0
1.5 81.0
2.0 80.0
Name: HR bpm, dtype: float64
以上输出一系列。如果要获取数据帧,可以在groupby之后调用reset_index
。
df.groupby(df.Time // 0.5 * 0.5)['HR bpm'].mean().reset_index()
Time HR bpm
0 0.5 97.0
1 1.0 86.0
2 1.5 81.0
3 2.0 80.0
在您的情况下,您可以按照df.groupby(df.Time // 10 * 10)
的方式执行某些操作。