我想按顺序在12列的数据框中输入一维数组,并用空数据填充缺失的数据。
像这样将数组A更改为2D数据帧(12列, 用NaN填写丢失的数据)
例如
A = np.arange(0,30)
像这样将数组A更改为2D数据帧(12列, 用NaN填写丢失的数据)
cols = ['1M', '2M', '3M','4M','5M','6M', '7M', '8M', '9M', '10M', '11M', '12M']
df = pd.DataFrame(columns=cols)
....
df.head()
1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M
0 0 1 2 3 4 5 6 7 8 9 10 11
1 12 13 14 15 16 17 18 19 20 21 22 23
2 24 25 26 27 28 29 NaN NaN NaN NaN NaN NaN
请帮助我。
答案 0 :(得分:1)
您可以使用numpy
对数组进行整形,然后将其转换为数据框。
a = np.arange(30, dtype=float)
b = np.resize(a, (3, 12))
b[2,len(a)%12:].fill(np.nan)
请注意,a
的类型必须为float
,因为np.nan
被认为是浮点数。
b
数组是:
array([[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.],
[12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.],
[24., 25., 26., 27., 28., 29., nan, nan, nan, nan, nan, nan]])
可以轻松将其转换为数据框。
cols = ['1M', '2M', '3M','4M','5M','6M', '7M', '8M', '9M', '10M', '11M', '12M']
df = pd.DataFrame(b, columns=cols)
df
是:
1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M
0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
1 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
2 24.0 25.0 26.0 27.0 28.0 29.0 NaN NaN NaN NaN NaN NaN
答案 1 :(得分:1)
这可能最好是通过重塑numpy
中的数据来实现:
import math
import numpy as np
import pandas as pd
# Get dimensions
n_cols = len(cols)
n_rows = math.ceil(len(A)/n_cols)
n_extra = (n_cols * n_rows)-len(A)
# Add extra values, then reshape
A = np.append(A, np.repeat(np.nan, n_extra))
A = A.reshape(n_rows,n_cols)
df = pd.DataFrame(A, columns=cols)
1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M
0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
1 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
2 24.0 25.0 26.0 27.0 28.0 29.0 NaN NaN NaN NaN NaN NaN
或者,您可以轻松为此编写函数:
import math
import numpy as np
import pandas as pd
def array_and_cols_into_df(arr, cols, fill = np.NaN):
"""
Reshapes array by columns, filling with `fill` into a df
"""
n_cols = len(cols)
n_rows = math.ceil(len(arr)/n_cols)
n_extra = (n_cols * n_rows)-len(arr)
new_arr = np.append(arr, np.repeat(fill, n_extra))
new_arr = new_arr.reshape(n_rows,n_cols)
df = pd.DataFrame(new_arr, columns = cols)
return df
# Now run the function with higher values:
A_80 = np.arange(0,80)
cols = ['1M', '2M', '3M','4M','5M','6M', '7M', '8M', '9M', '10M', '11M', '12M']
df = array_and_cols_into_df(A, cols)
print(df)
1M 2M 3M 4M 5M 6M 7M 8M 9M 10M 11M 12M
0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
1 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0
2 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0
3 36.0 37.0 38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0
4 48.0 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0
5 60.0 61.0 62.0 63.0 64.0 65.0 66.0 67.0 68.0 69.0 70.0 71.0
6 72.0 73.0 74.0 75.0 76.0 77.0 78.0 79.0 NaN NaN NaN NaN
答案 2 :(得分:0)
以您的情况
B=np.arange(len(A))
df = pd.crosstab(index=B//12,columns=B%12+1,values=A,aggfunc='sum').add_suffix('M')
col_0 1M 2M 3M 4M ... 9M 10M 11M 12M
row_0 ...
0 6670.0 5746.0 4608.0 3388.0 ... 4962.0 6987.0 8051.0 8325.0
1 6585.0 6183.0 4973.0 3541.0 ... NaN NaN NaN NaN
[2 rows x 12 columns]