除了在放置某些字符串的某些索引处,我想创建一个NaN
数组。
def to_padded_array(vals, idxs, length, fill_na):
arr = np.array([fill_na] * length)
np.put(arr, idxs, vals)
ser = pd.Series(arr)
return tuple(ser.tolist())
我有一些类似val和idx的示例:
idxs = np.array([0,4,5]) # this was made to be a numpy array
vals = pd.Series(['a', 'b', np.nan], name='city') # this actually would come from a pd.agg function
请注意,初始输入vals
具有NaN
。如果尝试设置fill_na=np.nan
,则会收到一条错误消息
could not convert string to float: 'a'
如果我使用fill_na=None
,则同时获得None
和NaN
,这不好:
>>> to_padded_array(vals, idxs, length=6,fill_na=None)
('a', None, None, None, 'b', nan)
我当时正在考虑使用熊猫来规避此问题,但是我还没有找到与numpy.put
对应的熊猫。我该怎么办?
答案 0 :(得分:2)
您可以在此处使用Series.reindex
:
示例
def to_padded_array(vals, idxs, length):
# Note that `vals` is a pd.Series object.
ser = pd.Series(vals.values, index=idxs).reindex(np.arange(length))
# if vals is an array, then vals can be used instead of vals.values
return tuple(ser.tolist())
to_padded_array(vals,idxs, 6)
[出]
('a', nan, nan, nan, 'b', nan)