How can I set (dtype=object) numpy array values to Python lists, without numpy interpreting the lists as lists of values?

时间:2015-05-24 21:54:45

标签: python numpy

I have an array of dtype=object, where the values are either Python lists, or np.nan.

I'd like to replace the values that are np.nan with [None] (not None).

For a pure Python list, I can already do this with [ x if (x is not np.nan) else [None] for x in s ], and converting the array to a list is fine for my purpose, but out of curiosity, I wonder how this can be done with a numpy array. The difficulty is that, when using indexing, numpy tries to interpret any list as a list of values, rather than as the actual value I want to assign.

If I wanted to replace the values with 2, for example, that is easy (normal np, pd imports; as an aside, np.isnan will not work in this instance, a weakness with the choice of float NaN for generic missing values in pandas, so I use pd.isnull, as this is for an issue with pandas internals anyway):

In [53]: s
Out[53]:
array([['asdf', 'asdf'], ['asdf'], nan, ['asdf', 'asdf', 'asdf'],
       ['asdf', 'asdf', 'asdf']], dtype=object)

In [55]: s[pd.isnull(s)] = 2

In [56]: s
Out[56]:
array([['asdf', 'asdf'], ['asdf'], 2, ['asdf', 'asdf', 'asdf'],
       ['asdf', 'asdf', 'asdf']], dtype=object)

Yet trying to replace them with [None] instead replaces them with None:

In [58]: s
Out[58]:
array([['asdf', 'asdf'], ['asdf'], nan, ['asdf', 'asdf', 'asdf'],
       ['asdf', 'asdf', 'asdf']], dtype=object)

In [59]: s[pd.isnull(s)] = [None]

In [60]: s
Out[60]:
array([['asdf', 'asdf'], ['asdf'], None, ['asdf', 'asdf', 'asdf'],
       ['asdf', 'asdf', 'asdf']], dtype=object)

This is, obviously, the behavior that one wants 99% of the time. It just so happens that this time, I want to assign the list as an object. Is there any way to do so?

2 个答案:

答案 0 :(得分:3)

第一个问题是s[…] = [None]尝试用一个值None的序列替换数组切片。您真正想要的是使用一个值[None]的序列替换切片​​,您将其写为[[None]]

然而,这实际上并不能解决您的问题;这只是让你找到你想要问的问题。

您需要拥有的是明确的1 object元素的数组恰好是列表[None]。例如:

>>> n = np.array([[None], 0], dtype=object)[:1]
>>> s[pd.isnull(s)] = n

或者,当然:

>>> n = np.empty((1,), dtype=object)
>>> n[0] = [None]
>>> s[pd.isnull(s)] = n

我90%肯定有一个更简洁易读的方法来创建一个保证具有值[None]的单元素数组,并且80%确定有更简单的方法来完成整个事情。第一名,所以希望有人会得到一个更好的答案......但如果没有,这将有效。

答案 1 :(得分:0)

我建议使用numpy.argmin(),因为它返回nan的位置,而不是用[None]替换它们:

import numpy as np
import pandas as pd

def to_none(array_):
    for i in range(array_[pd.isnull(array_)].size):
        array_[np.argmin(array_)] = [None]
    return array_


a = np.array([['asdf', 'asdf'], ['asdf'], np.nan, ['asdf', 'asdf', 'asdf'],np.nan,
       ['asdf', 'asdf', 'asdf']], dtype=object)
a = to_none(a)

print a

>>
[['asdf', 'asdf'] ['asdf'] [None] ['asdf', 'asdf', 'asdf'] [None]
 ['asdf', 'asdf', 'asdf']]

print a.dtype

>>
object