我有熊猫系列。它的大小是10240。系列中的每个值都是大小为143的2d数组。我将大小为143的所有2d数组变成大小为143的1d数组。此后,我将系列转换为numpy数组。所以我应该得到一个二维数组,大小为(10240 * 143),对吗?但是我不明白。我正在获取形状(10240,)和大小10240的2d数组。我不知道自己在做什么错。我的代码如下。
def get_subjects(x):
print(type(x)) #2d list
print(len(x)) # 2, 143
x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
print(type(x)) # numpy array
print(x.size) # 143
return x
print(type(train_data["subject_id"])) # pandas series
print(train_data["subject_id"].size) # 10240
subject_train = train_data["subject_id"].apply(lambda x: get_subjects(x)).to_numpy()
print(type(subject_train)) # numpy array
print(subject_train.size) # 10240
答案 0 :(得分:1)
由于'subject_train'是一个数组数组,因此无法获得预期的形状。为了避免这种情况,您可以将“ get_subjects”返回的1d数组拆分为多个列,然后转换为numpy数组,如下所示。
import pandas as pd
import numpy as np
# df has 5 rows and each cell is made of 3x4 arrays
df = pd.DataFrame({'data':[np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
]})
def get_subjects(x):
#substitute to x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
x = x.reshape(-1) # this one reshapes 3x4 array to 1x12
return x
# apply(pd.series) splits the each row made of 1x12 array to 12 seperate columns
df["data"].apply(lambda x: get_subjects(x)).apply(pd.Series).to_numpy().shape
结果
5,12