时间戳速度
1. 2014-12-04 12:30:10 104,105,105,106,106,106,99,90
2. 2014-12-04 12:32:19 86,86,87,88,88,89,90,92,93,95,97,100,102,104,1...
3. 2014-12-04 12:32:58 110,110,110,110,110,110,110,110,110,110,110,10..
DatetimeIndex:24条目,2014-12-04 12:30:10到2014-12-04 12:29:13 数据列(共1列): 加速24个非空对象
我想像这样传输DataFrame:
timestamp speeds
1. 2014-12-04 12:30:10 104
2. 2014-12-04 12:30:11 105
3. 2014-12-04 12:30:12 105
4. ....
5. 2014-12-04 12:32:17 90
6. 2014-12-04 12:32:18 88 (resample and fill the timestamp and the mean speed value)
7. 2014-12-04 12:32:19 86
8. 2014-12-04 12:32:20 86
9. 2014-12-04 12:32:21 87
有简单的功能吗? 或者只是一行一行并解析字段?
答案 0 :(得分:1)
不确定重新取样(很难说你想从你的例子做什么)。其他东西可能与熊猫(可能不是最优雅的方式):
>>> df2 = df.apply(lambda x: pd.Series(x['speeds']),axis=1)
>>> df2['timestamp'] = df['timestamp']
>>> df2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 timestamp
0 104 105 105 106 106 106 99 90 NaN NaN NaN NaN NaN NaN 2014-12-04 12:30:10
1 6 86 87 88 88 89 90 92 93 95 97 100 102 104 2014-12-04 12:32:19
>>>
>>> df2 = df2.set_index('timestamp').stack().reset_index()
>>> df2['timestamp'] = df2.apply(lambda x: x['timestamp'] + timedelta(seconds=x['level_1']), axis=1)
>>> del df2['level_1']
>>> df2
timestamp 0
0 2014-12-04 12:30:10 104
1 2014-12-04 12:30:11 105
2 2014-12-04 12:30:12 105
3 2014-12-04 12:30:13 106
4 2014-12-04 12:30:14 106
5 2014-12-04 12:30:15 106
6 2014-12-04 12:30:16 99
7 2014-12-04 12:30:17 90
8 2014-12-04 12:32:19 6
9 2014-12-04 12:32:20 86
10 2014-12-04 12:32:21 87
11 2014-12-04 12:32:22 88
12 2014-12-04 12:32:23 88
13 2014-12-04 12:32:24 89
14 2014-12-04 12:32:25 90
15 2014-12-04 12:32:26 92
16 2014-12-04 12:32:27 93
17 2014-12-04 12:32:28 95
18 2014-12-04 12:32:29 97
19 2014-12-04 12:32:30 100
20 2014-12-04 12:32:31 102
21 2014-12-04 12:32:32 104
答案 1 :(得分:0)
不确定熊猫,但你可以在纯python中做到这一点。很难,我不知道你的意思是“(重新采样并填写时间戳和平均速度值)”。但如果没有这个,你可以如下:
from datetime import datetime, timedelta
in_s = ["2014-12-04 12:30:10 104,105,105,106,106,106,99,90",
"2014-12-04 12:32:19 86,86,87,88,88,89,90,92,93,95,97,100,102,104",
"2014-12-04 12:32:58 110,110,110,110,110,110,110,110,110,110,110"]
for row in in_s:
date_str,time_str, entries_str = row.split()
#print(a_date,a_time, entries)
a_time = datetime.strptime(time_str, "%H:%M:%S")
for e in entries_str.split(','):
print(date_str, datetime.strftime(a_time, "%H:%M:%S"), e)
a_time = a_time + timedelta(seconds=1)
这导致:
2014-12-04 12:30:10 104
2014-12-04 12:30:11 105
2014-12-04 12:30:12 105
2014-12-04 12:30:13 106
2014-12-04 12:30:14 106
2014-12-04 12:30:15 106
2014-12-04 12:30:16 99
2014-12-04 12:30:17 90
2014-12-04 12:32:19 86
2014-12-04 12:32:20 86
2014-12-04 12:32:21 87
2014-12-04 12:32:22 88
2014-12-04 12:32:23 88
2014-12-04 12:32:24 89
2014-12-04 12:32:25 90
2014-12-04 12:32:26 92
2014-12-04 12:32:27 93
2014-12-04 12:32:28 95
2014-12-04 12:32:29 97
2014-12-04 12:32:30 100
2014-12-04 12:32:31 102
2014-12-04 12:32:32 104
2014-12-04 12:32:58 110
2014-12-04 12:32:59 110
2014-12-04 12:33:00 110
2014-12-04 12:33:01 110
2014-12-04 12:33:02 110
2014-12-04 12:33:03 110
2014-12-04 12:33:04 110
2014-12-04 12:33:05 110
2014-12-04 12:33:06 110
2014-12-04 12:33:07 110
2014-12-04 12:33:08 110
答案 2 :(得分:0)
您可能会发现this link有帮助。
以上文章的摘录:
# Explode/Split column into multiple rows
new_df = pd.DataFrame(df.City.str.split('|').tolist(), index=df.EmployeeId).stack()
new_df = new_df.reset_index([0, 'EmployeeId'])
new_df.columns = ['EmployeeId', 'City']