我想根据数组nShiftsPerCol中指定的移位数重复移动数据帧的选择列。如何生成包含指定非零移位的列的输出数据帧DFO,并且每个列都移位了多次。注意,第一班次是零或没有班次。将班次编号附加到列名称。
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [2, 3, 4, 5, 6], 'C': [3, 4, 5, 6, 7]})
print(df)
nCols = df.shape[0]
nShiftsPerCol = np.zeros(nCols)
nShiftsPerCol[0]=3 # shift column A 3 times
nShiftsPerCol[2]=2 # shift column C 2 times
原始数据框
A B C
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
期望的输出
A_0 A_1 A_2 C_0 C_1
0 1 2 3 3 4
1 2 3 4 4 5
2 3 4 5 5 6
3 4 5 NA 6 7
4 5 NA NA 7 NA
答案 0 :(得分:1)
首先创建Series
并过滤掉0
值:
#for columns need shape[1]
nCols = df.shape[1]
nShiftsPerCol = np.zeros(nCols)
nShiftsPerCol[0]=3 # shift column A 3 times
nShiftsPerCol[2]=2 # shift column C 2 times
print (nShiftsPerCol)
s = pd.Series(nShiftsPerCol, df.columns).astype(int)
s = s[s!=0]
print (s)
A 3
C 2
dtype: int32
然后循环并创建新列:
for i, x in s.items():
for y in range(x):
df['{}_{}'.format(i, y)] = df[i].shift(-y)
print (df)
A B C A_0 A_1 A_2 C_0 C_1
0 1 2 3 1 2.0 3.0 3 4.0
1 2 3 4 2 3.0 4.0 4 5.0
2 3 4 5 3 4.0 5.0 5 6.0
3 4 5 6 4 5.0 NaN 6 7.0
4 5 6 7 5 NaN NaN 7 NaN
商店列名称和班次编号的另一种解决方案是元组列表:
L = list(zip(df.columns, nShiftsPerCol.astype(int)))
L = [x for x in L if x[1] != 0]
print (L)
[('A', 3), ('C', 2)]
for i, x in L:
for y in range(x):
df['{}_{}'.format(i, y)] = df[i].shift(-y)
print (df)
A B C A_0 A_1 A_2 C_0 C_1
0 1 2 3 1 2.0 3.0 3 4.0
1 2 3 4 2 3.0 4.0 4 5.0
2 3 4 5 3 4.0 5.0 5 6.0
3 4 5 6 4 5.0 NaN 6 7.0
4 5 6 7 5 NaN NaN 7 NaN
答案 1 :(得分:0)
你也可以尝试这个
from itertools import chain
nShiftsPerCol = [3, 0, 2]
# define a function to help generate shifted columns
col_maker = lambda df, x, num: df[x].shift(-num)
# generate new_cols from nShiftPerCol
new_cols = chain(*[[(df.columns[idx], i) for i in range(v)]
for idx, v in enumerate(nShiftsPerCol) if v != 0])
# output of new_cols
# [('A', 0), ('A', 1), ('A', 2), ('C', 0), ('C', 1)]
df_desired = pd.DataFrame({col + "_" + str(num): col_maker(df, col, num)
for col, num in new_cols})