Question

我想创建一个数组来容纳混合类型-字符串和整数。

以下代码无法正常工作-所有元素都键入为String。

>>> a=numpy.array(["Str",1,2,3,4])
>>> print a
['Str' '1' '2' '3' '4']
>>> print type(a[0]),type(a[1])
<type 'numpy.string_'> <type 'numpy.string_'>

数组的所有元素都键入为'numpy.string _'

但是，奇怪的是，如果我将其中一个元素作为“ None”传递，则类型会按需出现：

>>> a=numpy.array(["Str",None,2,3,4])
>>> print a
['Str' None 2 3 4]
>>> print type(a[0]),type(a[1]),type(a[2])
<type 'str'> <type 'NoneType'> <type 'int'>

因此，包括“ None”元素为我提供了一种解决方法，但是我想知道为什么会是这种情况。即使我不将元素之一传递为“无”，也不应该在传递元素时键入它们？

Answer 1

强烈建议不要在NumPy中使用混合类型。您将失去向量化计算的优势。在这种情况下：

对于您的第一个数组，NumPy决定转换您的由3个或更少字符组成的统一字符串数组。
对于第二个数组，在NumPy中不允许将None作为“可字符串化”变量，因此NumPy使用标准的object dtype。 object dtype表示指向任意类型的指针的集合。

打印数组的dtype属性时，您会看到以下信息：

print(np.array(["Str",1,2,3,4]).dtype)     # <U3
print(np.array(["Str",None,2,3,4]).dtype)  # object

这应该完全可以预期。 NumPy非常喜欢同构类型，因为对于任何有意义的计算，您实际上都应该拥有。否则，Python list可能是更合适的数据结构。

有关NumPy如何优先dtype选择的详细说明，请参阅：

Answer 2

添加None的另一种方法是使dtype显式：

In [80]: np.array(["str",1,2,3,4])
Out[80]: array(['str', '1', '2', '3', '4'], dtype='<U3')
In [81]: np.array(["str",1,2,3,4], dtype=object)
Out[81]: array(['str', 1, 2, 3, 4], dtype=object)

创建对象dtype数组并从列表填充它是另一种选择：

In [85]: res = np.empty(5, object)
In [86]: res
Out[86]: array([None, None, None, None, None], dtype=object)
In [87]: res[:] = ['str', 1, 2, 3, 4]
In [88]: res
Out[88]: array(['str', 1, 2, 3, 4], dtype=object)

这里不是必需的，但是当您需要一个列表数组时，这很重要。

Python ndarray具有不同类型的元素

2 个答案: