无法将熊猫系列转换为二维数组?

时间:2020-04-15 02:02:24

标签: python-3.x pandas numpy

我有熊猫系列。它的大小是10240。系列中的每个值都是大小为143的2d数组。我将大小为143的所有2d数组变成大小为143的1d数组。此后,我将系列转换为numpy数组。所以我应该得到一个二维数组,大小为(10240 * 143),对吗?但是我不明白。我正在获取形状(10240,)和大小10240的2d数组。我不知道自己在做什么错。我的代码如下。

def get_subjects(x):
  print(type(x)) #2d list
  print(len(x)) # 2, 143
  x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
  print(type(x)) # numpy array
  print(x.size) # 143
  return x

print(type(train_data["subject_id"])) # pandas series
print(train_data["subject_id"].size) # 10240
subject_train = train_data["subject_id"].apply(lambda x: get_subjects(x)).to_numpy()
print(type(subject_train)) # numpy array
print(subject_train.size) # 10240 

1 个答案:

答案 0 :(得分:1)

由于'subject_train'是一个数组数组,因此无法获得预期的形状。为了避免这种情况,您可以将“ get_subjects”返回的1d数组拆分为多个列,然后转换为numpy数组,如下所示。

import pandas as pd
import numpy as np
# df has 5 rows and each cell is made of 3x4 arrays 
df = pd.DataFrame({'data':[np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                          ]})

def get_subjects(x):
  #substitute to x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
  x = x.reshape(-1) # this one reshapes 3x4 array to 1x12
  return x

# apply(pd.series) splits the each row made of 1x12 array to 12 seperate columns
df["data"].apply(lambda x: get_subjects(x)).apply(pd.Series).to_numpy().shape

结果

5,12