我在pandas数据帧中有时间序列数据,其中索引为测量开始时的时间,列为以固定采样率记录的值列表(连续索引的差异/列表中元素的数量)
以下是它的样子:
Time A B ....... Z
0 [1, 2, 3, 4] [1, 2, 3, 4]
2 [5, 6, 7, 8] [5, 6, 7, 8]
4 [9, 10, 11, 12] [9, 10, 11, 12]
6 [13, 14, 15, 16] [13, 14, 15, 16 ]
...
我想将所有列中的每一行扩展为多行,以便:
Time A B .... Z
0 1 1
0.5 2 2
1 3 3
1.5 4 4
2 5 5
2.5 6 6
.......
到目前为止,我正在考虑这些方面(代码没有意思):
def expand_row(dstruc):
for i in range (len(dstruc)):
for j in range (1,len(dstruc[i])):
dstruc.loc[i+j/len(dstruc[i])] = dstruc[i][j]
dstruc.loc[i] = dstruc[i][0]
return dstruc
expanded = testdf.apply(expand_row)
我也尝试过使用split(',')和stack(),但我无法正确修复索引。
答案 0 :(得分:4)
import numpy as np
import pandas as pd
df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')},
index=range(0,8,2))
result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'
grouped = result.groupby(level=0)
increment = (grouped.cumcount()/grouped.size())
result.index = result.index + increment
print(result)
产量
In [183]: result
Out[183]:
A B C
Time
0.00 1 1 1
0.25 2 2 2
0.50 3 3 3
0.75 4 4 4
2.00 5 5 5
2.25 6 6 6
2.50 7 7 7
2.75 8 8 8
4.00 9 9 9
4.25 10 10 10
4.50 11 11 11
4.75 12 12 12
6.00 13 13 13
6.25 14 14 14
6.50 15 15 15
6.75 16 16 16
<强>解释强>:
循环列表内容的一种方法是使用列表解析:
In [172]: df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')}, index=range(2,10,2))
In [173]: [(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)]
Out[173]:
[(0, (1, 1, 1)),
(0, (2, 2, 2)),
...
(6, (15, 15, 15)),
(6, (16, 16, 16))]
获得上述表单中的值后,您可以使用pd.DataFrame.from_items
构建所需的DataFrame:
result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'
产量
In [175]: result
Out[175]:
A B C
Time
2 1 1 1
2 2 2 2
...
8 15 15 15
8 16 16 16
要计算要添加到索引的增量,您可以按索引进行分组,并找到每个组的cumcount
与size
的比率:
In [176]: grouped = result.groupby(level=0)
In [177]: increment = (grouped.cumcount()/grouped.size())
In [179]: result.index = result.index + increment
In [199]: result.index
Out[199]:
Int64Index([ 0.0, 0.25, 0.5, 0.75, 2.0, 2.25, 2.5, 2.75, 4.0, 4.25, 4.5,
4.75, 6.0, 6.25, 6.5, 6.75],
dtype='float64', name=u'Time')
答案 1 :(得分:0)
可能不理想,但可以使用groupby
完成并应用一个函数,该函数返回每行的扩展DataFrame(此处假设时间差固定为2.0):
def expand(x):
data = {c: x[c].iloc[0] for c in x if c != 'Time'}
n = len(data['A'])
step = 2.0 / n;
data['Time'] = [x['Time'].iloc[0] + i*step for i in range(n)]
return pd.DataFrame(data)
print df.groupby('Time').apply(expand).set_index('Time', drop=True)
输出:
A B
Time
0.0 1 1
0.5 2 2
1.0 3 3
1.5 4 4
2.0 5 5
2.5 6 6
3.0 7 7
3.5 8 8
4.0 9 9
4.5 10 10
5.0 11 11
5.5 12 12
6.0 13 13
6.5 14 14
7.0 15 15
7.5 16 16
答案 2 :(得分:0)
说,要扩展的数据框名为cyl = 6
,您可以使用df_to_expand
执行以下操作。
eval
参考: covert a string which is a list into a proper list python