Question

我想按顺序在12列的数据框中输入一维数组，并用空数据填充缺失的数据。

像这样将数组A更改为2D数据帧（12列，用NaN填写丢失的数据）

例如

A = np.arange(0,30)

像这样将数组A更改为2D数据帧（12列，用NaN填写丢失的数据）

cols = ['1M', '2M', '3M','4M','5M','6M', '7M', '8M', '9M', '10M', '11M', '12M']
df = pd.DataFrame(columns=cols)

....

df.head()

     1M   2M  3M  4M  5M  6M   7M   8M    9M   10M   11M   12M
0     0   1   2   3   4   5     6    7     8     9    10    11
1    12  13  14  15  16  17    18   19    20    21    22    23
2    24  25  26  27  28  29   NaN  NaN   NaN   NaN   NaN   NaN

enter image description here

请帮助我。

Answer 1

您可以使用numpy对数组进行整形，然后将其转换为数据框。

a = np.arange(30, dtype=float)
b = np.resize(a, (3, 12))
b[2,len(a)%12:].fill(np.nan)

请注意，a的类型必须为float，因为np.nan被认为是浮点数。

b数组是：

array([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.],
       [12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.],
       [24., 25., 26., 27., 28., 29., nan, nan, nan, nan, nan, nan]])

可以轻松将其转换为数据框。

cols = ['1M', '2M', '3M','4M','5M','6M', '7M', '8M', '9M', '10M', '11M', '12M']
df = pd.DataFrame(b, columns=cols)

df是：

     1M    2M    3M    4M    5M    6M    7M    8M    9M   10M   11M   12M
0   0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0
1  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
2  24.0  25.0  26.0  27.0  28.0  29.0   NaN   NaN   NaN   NaN   NaN   NaN

Answer 2

这可能最好是通过重塑numpy中的数据来实现：

import math
import numpy as np
import pandas as pd

# Get dimensions
n_cols = len(cols)
n_rows = math.ceil(len(A)/n_cols)
n_extra = (n_cols * n_rows)-len(A)

# Add extra values, then reshape
A = np.append(A, np.repeat(np.nan, n_extra))
A = A.reshape(n_rows,n_cols)
df = pd.DataFrame(A, columns=cols)

     1M    2M    3M    4M    5M    6M    7M    8M    9M   10M   11M   12M
0   0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0
1  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
2  24.0  25.0  26.0  27.0  28.0  29.0   NaN   NaN   NaN   NaN   NaN   NaN

或者，您可以轻松为此编写函数：

import math
import numpy as np
import pandas as pd

def array_and_cols_into_df(arr, cols, fill = np.NaN):
  """
  Reshapes array by columns, filling with `fill` into a df
  """
  n_cols = len(cols)
  n_rows = math.ceil(len(arr)/n_cols)
  n_extra = (n_cols * n_rows)-len(arr)

  new_arr = np.append(arr, np.repeat(fill, n_extra))
  new_arr = new_arr.reshape(n_rows,n_cols)

  df = pd.DataFrame(new_arr, columns = cols)
  return df

# Now run the function with higher values:
A_80 = np.arange(0,80)
cols = ['1M', '2M', '3M','4M','5M','6M', '7M', '8M', '9M', '10M', '11M', '12M']

df = array_and_cols_into_df(A, cols)
print(df)

     1M    2M    3M    4M    5M    6M    7M    8M    9M   10M   11M   12M
0   0.0   1.0   2.0   3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0
1  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
2  24.0  25.0  26.0  27.0  28.0  29.0  30.0  31.0  32.0  33.0  34.0  35.0
3  36.0  37.0  38.0  39.0  40.0  41.0  42.0  43.0  44.0  45.0  46.0  47.0
4  48.0  49.0  50.0  51.0  52.0  53.0  54.0  55.0  56.0  57.0  58.0  59.0
5  60.0  61.0  62.0  63.0  64.0  65.0  66.0  67.0  68.0  69.0  70.0  71.0
6  72.0  73.0  74.0  75.0  76.0  77.0  78.0  79.0   NaN   NaN   NaN   NaN

Answer 3

以您的情况

B=np.arange(len(A))
df = pd.crosstab(index=B//12,columns=B%12+1,values=A,aggfunc='sum').add_suffix('M')
col_0      1M      2M      3M      4M  ...      9M     10M     11M     12M
row_0                                  ...                                
0      6670.0  5746.0  4608.0  3388.0  ...  4962.0  6987.0  8051.0  8325.0
1      6585.0  6183.0  4973.0  3541.0  ...     NaN     NaN     NaN     NaN
[2 rows x 12 columns]

将数组更改为12列2D数据帧

3 个答案: