pandas如何一行生成多行

时间:2014-12-09 02:54:34

标签: python pandas

时间戳速度

 1. 2014-12-04 12:30:10  104,105,105,106,106,106,99,90
 2. 2014-12-04 12:32:19  86,86,87,88,88,89,90,92,93,95,97,100,102,104,1...
 3. 2014-12-04 12:32:58  110,110,110,110,110,110,110,110,110,110,110,10..

DatetimeIndex:24条目,2014-12-04 12:30:10到2014-12-04 12:29:13 数据列(共1列): 加速24个非空对象

我想像这样传输DataFrame:

timestamp                                              speeds               

 1. 2014-12-04 12:30:10                                   104
 2. 2014-12-04 12:30:11                                   105
 3. 2014-12-04 12:30:12                                   105
 4. ....
 5. 2014-12-04 12:32:17                                   90
 6. 2014-12-04 12:32:18                    88 (resample and fill the timestamp and the mean speed value)
 7. 2014-12-04 12:32:19                                   86
 8. 2014-12-04 12:32:20                                   86
 9. 2014-12-04 12:32:21                                   87

有简单的功能吗? 或者只是一行一行并解析字段?

3 个答案:

答案 0 :(得分:1)

不确定重新取样(很难说你想从你的例子做什么)。其他东西可能与熊猫(可能不是最优雅的方式):

>>> df2 = df.apply(lambda x: pd.Series(x['speeds']),axis=1)
>>> df2['timestamp'] = df['timestamp']
>>> df2
     0    1    2    3    4    5   6   7   8   9  10   11   12   13           timestamp
0  104  105  105  106  106  106  99  90 NaN NaN NaN  NaN  NaN  NaN 2014-12-04 12:30:10
1    6   86   87   88   88   89  90  92  93  95  97  100  102  104 2014-12-04 12:32:19
>>>
>>> df2 = df2.set_index('timestamp').stack().reset_index()
>>> df2['timestamp'] = df2.apply(lambda x: x['timestamp'] + timedelta(seconds=x['level_1']), axis=1)
>>> del df2['level_1']
>>> df2
             timestamp    0
0  2014-12-04 12:30:10  104
1  2014-12-04 12:30:11  105
2  2014-12-04 12:30:12  105
3  2014-12-04 12:30:13  106
4  2014-12-04 12:30:14  106
5  2014-12-04 12:30:15  106
6  2014-12-04 12:30:16   99
7  2014-12-04 12:30:17   90
8  2014-12-04 12:32:19    6
9  2014-12-04 12:32:20   86
10 2014-12-04 12:32:21   87
11 2014-12-04 12:32:22   88
12 2014-12-04 12:32:23   88
13 2014-12-04 12:32:24   89
14 2014-12-04 12:32:25   90
15 2014-12-04 12:32:26   92
16 2014-12-04 12:32:27   93
17 2014-12-04 12:32:28   95
18 2014-12-04 12:32:29   97
19 2014-12-04 12:32:30  100
20 2014-12-04 12:32:31  102
21 2014-12-04 12:32:32  104

答案 1 :(得分:0)

不确定熊猫,但你可以在纯python中做到这一点。很难,我不知道你的意思是“(重新采样并填写时间戳和平均速度值)”。但如果没有这个,你可以如下:

from datetime import datetime, timedelta

in_s = ["2014-12-04 12:30:10  104,105,105,106,106,106,99,90",
        "2014-12-04 12:32:19  86,86,87,88,88,89,90,92,93,95,97,100,102,104",
        "2014-12-04 12:32:58  110,110,110,110,110,110,110,110,110,110,110"]

for row in in_s:
    date_str,time_str, entries_str = row.split()
    #print(a_date,a_time, entries)
    a_time = datetime.strptime(time_str, "%H:%M:%S")
    for e in entries_str.split(','):      
        print(date_str, datetime.strftime(a_time, "%H:%M:%S"), e)
        a_time = a_time + timedelta(seconds=1)

这导致:

2014-12-04 12:30:10 104
2014-12-04 12:30:11 105
2014-12-04 12:30:12 105
2014-12-04 12:30:13 106
2014-12-04 12:30:14 106
2014-12-04 12:30:15 106
2014-12-04 12:30:16 99
2014-12-04 12:30:17 90
2014-12-04 12:32:19 86
2014-12-04 12:32:20 86
2014-12-04 12:32:21 87
2014-12-04 12:32:22 88
2014-12-04 12:32:23 88
2014-12-04 12:32:24 89
2014-12-04 12:32:25 90
2014-12-04 12:32:26 92
2014-12-04 12:32:27 93
2014-12-04 12:32:28 95
2014-12-04 12:32:29 97
2014-12-04 12:32:30 100
2014-12-04 12:32:31 102
2014-12-04 12:32:32 104
2014-12-04 12:32:58 110
2014-12-04 12:32:59 110
2014-12-04 12:33:00 110
2014-12-04 12:33:01 110
2014-12-04 12:33:02 110
2014-12-04 12:33:03 110
2014-12-04 12:33:04 110
2014-12-04 12:33:05 110
2014-12-04 12:33:06 110
2014-12-04 12:33:07 110
2014-12-04 12:33:08 110

答案 2 :(得分:0)

您可能会发现this link有帮助。

以上文章的摘录:

# Explode/Split column into multiple rows
new_df = pd.DataFrame(df.City.str.split('|').tolist(), index=df.EmployeeId).stack()
new_df = new_df.reset_index([0, 'EmployeeId'])
new_df.columns = ['EmployeeId', 'City']