Question

我每行都有时间序列数据（以列为时间步长），我想根据条件行的值（即“填充量”）用0左右填充每一行。这就是我所拥有的：

Padding amount     T1     T2     T3
   0               3      2.9    2.8
   1               2.9    2.8    2.7
   1               2.8    2.3    2.0
   2               4.4    3.3    2.3

这就是我想要产生的：

Padding amount     T1     T2     T3     T4     T5
   0               3      2.9    2.8    0      0    (--> padding = 0, so no change)
   1               0      2.9    2.8    2.7    0    (--> shifted one to the left)
   1               0      2.8    2.3    2.0    0
   2               0      0      4.4    3.3    2.3  (--> shifted two to the right)

我看到Keras具有序列填充，但是由于所有行都具有相同数量的条目，因此不确定如何进行填充。我正在查看Shift和np.roll，但是我确定已经有解决方案。

Answer 1

在numpy中，您可以为要放置数组元素的位置构造一个索引数组。

假设您有

padding = np.array([0, 1, 1, 2])
data = np.array([[3.0, 2.9, 2.8],
                 [2.9, 2.8, 2.7],
                 [2.8, 2.3, 2.0],
                 [4.4, 3.3, 2.3]])
M, N = data.shape

输出数组为

output = np.zeros((M, N + padding.max()))

您可以为数据去向建立索引：

rows = np.arange(M)[:, None]
cols = padding[:, None] + np.arange(N)

由于索引的形状会广播为数据的形状，因此您可以直接分配输出：

output[rows, cols] = data

不确定这对DataFrame的确切适用方式，但是您可以在对旧的values进行操作之后构造一个新的。另外，您可能可以直接在熊猫中等效地实现所有这些操作。

Answer 2

这是一种实现方式，我使该过程变得非常灵活，可以执行多少时间段/步骤：

import pandas as pd

#data
d = {'Padding amount': [0, 1, 1, 2],
 'T1': [3, 2.9, 2.8, 4.4],
 'T2': [2.9, 2.7, 2.3, 3.3],
 'T3': [2.8, 2.7, 2.0, 2.3]}
#create DF
df = pd.DataFrame(data = d)
#get max padding amount
maxPadd = df['Padding amount'].max()
#list of time periods
timePeriodsCols = [c for c in df.columns.tolist() if 'T' in c]
#reverse list
reverseList = timePeriodsCols[::-1]
#number of periods
noOfPeriods = len(timePeriodsCols)

#create new needed columns
for i in range(noOfPeriods + 1, noOfPeriods + 1 + maxPadd):
    df['T' + str(i)] = ''

#loop over records
for i, row in df.iterrows():
    #get padding amount
    padAmount = df.at[i, 'Padding amount']
    #if zero then do nothing
    if padAmount == 0:
        continue
    #else: roll column value by padding amount and set old location to zero
    else:
        for col in reverseList:
            df.at[i, df.columns[df.columns.get_loc(col) + padAmount]] = df.at[i, df.columns[df.columns.get_loc(col)]]
            df.at[i, df.columns[df.columns.get_loc(col)]] = 0

print(df)

   Padding amount   T1   T2   T3   T4   T5
0               0  3.0  2.9  2.8          
1               1  0.0  2.9  2.7  2.7     
2               1  0.0  2.8  2.3    2     
3               2  0.0  0.0  4.4  3.3  2.3

根据条件填充行

2 个答案: