I have an array of dtype=object, where the values are either Python lists, or np.nan
.
I'd like to replace the values that are np.nan
with [None] (not None).
For a pure Python list, I can already do this with [ x if (x is not np.nan) else [None] for x in s ]
, and converting the array to a list is fine for my purpose, but out of curiosity, I wonder how this can be done with a numpy array. The difficulty is that, when using indexing, numpy tries to interpret any list as a list of values, rather than as the actual value I want to assign.
If I wanted to replace the values with 2
, for example, that is easy (normal np, pd imports; as an aside, np.isnan will not work in this instance, a weakness with the choice of float NaN for generic missing values in pandas, so I use pd.isnull, as this is for an issue with pandas internals anyway):
In [53]: s
Out[53]:
array([['asdf', 'asdf'], ['asdf'], nan, ['asdf', 'asdf', 'asdf'],
['asdf', 'asdf', 'asdf']], dtype=object)
In [55]: s[pd.isnull(s)] = 2
In [56]: s
Out[56]:
array([['asdf', 'asdf'], ['asdf'], 2, ['asdf', 'asdf', 'asdf'],
['asdf', 'asdf', 'asdf']], dtype=object)
Yet trying to replace them with [None] instead replaces them with None:
In [58]: s
Out[58]:
array([['asdf', 'asdf'], ['asdf'], nan, ['asdf', 'asdf', 'asdf'],
['asdf', 'asdf', 'asdf']], dtype=object)
In [59]: s[pd.isnull(s)] = [None]
In [60]: s
Out[60]:
array([['asdf', 'asdf'], ['asdf'], None, ['asdf', 'asdf', 'asdf'],
['asdf', 'asdf', 'asdf']], dtype=object)
This is, obviously, the behavior that one wants 99% of the time. It just so happens that this time, I want to assign the list as an object. Is there any way to do so?
答案 0 :(得分:3)
第一个问题是s[…] = [None]
尝试用一个值None
的序列替换数组切片。您真正想要的是使用一个值[None]
的序列替换切片,您将其写为[[None]]
。
然而,这实际上并不能解决您的问题;这只是让你找到你想要问的问题。
您需要拥有的是明确的1 object
元素的数组恰好是列表[None]
。例如:
>>> n = np.array([[None], 0], dtype=object)[:1]
>>> s[pd.isnull(s)] = n
或者,当然:
>>> n = np.empty((1,), dtype=object)
>>> n[0] = [None]
>>> s[pd.isnull(s)] = n
我90%肯定有一个更简洁易读的方法来创建一个保证具有值[None]
的单元素数组,并且80%确定有更简单的方法来完成整个事情。第一名,所以希望有人会得到一个更好的答案......但如果没有,这将有效。
答案 1 :(得分:0)
我建议使用numpy.argmin()
,因为它返回nan
的位置,而不是用[None]
替换它们:
import numpy as np
import pandas as pd
def to_none(array_):
for i in range(array_[pd.isnull(array_)].size):
array_[np.argmin(array_)] = [None]
return array_
a = np.array([['asdf', 'asdf'], ['asdf'], np.nan, ['asdf', 'asdf', 'asdf'],np.nan,
['asdf', 'asdf', 'asdf']], dtype=object)
a = to_none(a)
print a
>>
[['asdf', 'asdf'] ['asdf'] [None] ['asdf', 'asdf', 'asdf'] [None]
['asdf', 'asdf', 'asdf']]
print a.dtype
>>
object