旋转表格后如何拆分列?

时间:2019-03-04 11:53:12

标签: python-3.x pandas pivot

我有一个最初看起来像这样的数据集

ContextID   VariableID  Timestamp   Timestampms Value
    7304693 516 2018-07-11 10:49:36 153 1.00000001335143e-10
    7304693 516 2018-07-11 10:49:36 291 1.00000001335143e-10
    7304693 516 2018-07-11 10:49:36 455 1.00000001335143e-10
    7304693 517 2018-07-11 10:49:36 153 0.00266113295219839
    7304693 517 2018-07-11 10:49:36 291 0.00266113295219839
    7304693 517 2018-07-11 10:49:36 455 0.00236816401593387
    7304693 517 2018-07-11 10:49:36 483 0.00236816401593387

我想透视数据集以使VariableID作为单独的列,为此我不得不结合TimestampTimestampms来创建唯一值,我这样做是

data = pd.read_excel('Book1.xlsx', header = 0, parse_dates = [['Timestamp', 'Timestampms']])
data = data.rename(columns={'Timestamp_Timestampms': 'Time'})
data = data.pivot(index= 'Time', columns='VariableID', values='Value')
data = data.reset_index(level=0)

并获得以下数据框

Time                        516                           517    
2018-07-11 10:49:36 153 1.00000001335143e-10    0.00266113295219839
2018-07-11 10:49:36 291 1.00000001335143e-10    0.00266113295219839
2018-07-11 10:49:36 455 1.00000001335143e-10    0.00236816401593387
2018-07-11 10:49:36 483     nan                 0.00236816401593387

现在,我想要一些如何将Time列分成2个不同列的帮助。第一个仅包含日期的列,第二个包含时间的列,其次是其他列,例如516517

Date          Time_ms
2018-07-11  10:49:36_153
2018-07-11  10:49:36_291
2018-07-11  10:49:36_455
2018-07-11  10:49:36_483
2018-07-11  10:49:36_578

此外,我想将原始表中的ContextID列设置为数据透视表的索引,并想知道该怎么做?

预先感谢

1 个答案:

答案 0 :(得分:2)

Series.str.splitSeries.str.replace一起使用:

data = data.rename(columns={'Timestamp_Timestampms': 'Time'})
#added ContextID column
data = data.set_index(['ContextID','Time','VariableID'])['Value'].unstack()
data = data.reset_index()

data[['Time','Time_ms']] = data.Time.str.split(n=1, expand=True)
#python separator for ms is . (altarnative solution)
#data['Time_ms'] = data['Time_ms'].str.replace('\s+', '.')
data['Time_ms'] = data['Time_ms'].str.replace('\s+', '_')

c = ['ContextID','Time','Time_ms']
data = data[c + data.columns.difference(c).tolist()]
data = data.rename_axis(None, axis=1)
print (data)
   ContextID        Time       Time_ms           516       517
0    7304693  2018-07-11  10:49:36_153  1.000000e-10  0.002661
1    7304693  2018-07-11  10:49:36_291  1.000000e-10  0.002661
2    7304693  2018-07-11  10:49:36_455  1.000000e-10  0.002368
3    7304693  2018-07-11  10:49:36_483           NaN  0.002368