Question

我有一个看起来像这样的pandas DataFrame：

Category  Subcategory  Count
A         1            [20.0 38.5, 3.2 8.5]
A         2            [3.7 8.2, 5.7 5.5]
A         3            [12.4 23.5, 24.4 8.9]
B         1            [3.7 8.2, 5.7 5.5]
B         2            [12.4 23.5, 24.4 8.9]
...      ...           ...
...      ...           ...

列Count包含我最终希望转换为Numpy ndarray的字符串。最后，我的目标是为每个Subcategory 每个创建一个ndarray。我试过的是将df按Category分组，并在每个组上进行迭代，获取Category的字符串，剥离Count并将其分割为空格，然后将其与{ {1}}来创建ndarrays，但是它似乎并没有按要求工作。

例如对于[]和np.to_numpy()，我想要这样的东西：

Category A

有什么建议吗？

谢谢！

-到目前为止，这是一种有效但无效的方法：

Subcategory 1

Answer 1

尝试一下，它应该去掉括号并创建双精度数组：

def reshape_array_string(x):
    temp = x.replace('[', '').replace(']','').replace(',','').split(" ")
    shapelen = len(temp)//2
    return (np.reshape(temp, [shapelen,2])) 

df['Count'].apply(reshape_array_string)

输出：

0              [[20.0, 38.5], [3.2, 8.5]]
1    [[3.7, 8.2], [5.7, 5.5], [4.6, 2.2]]

Answer 2

您还可以将列表理解与str.findall一起使用：

df = pd.DataFrame({"Count":["[20.0 38.5, 3.2 8.5]",
                            "[3.7 8.2, 5.7 5.5]"]})

result = [np.reshape([float(x) for x in i], [2,2])
          for i in df["Count"].str.findall("\d+\.\d+")]

print (result)

#
[array([[20. , 38.5],[ 3.2,  8.5]]),
 array([[3.7, 8.2], [5.7, 5.5]])]

将熊猫嵌套系列转换为Numpy ndarray

2 个答案: