Question

我挣扎着一个我无法理解的奇怪的虫子。也许这是我忽视的一些非常基本的东西。代码如下：

df = pd.DataFrame(
    some_numpy_array, 
    columns=[i for i in range(N)])

df.shape
(57058, 20)

some_pd_series.shape
(57058,)

df["Text"] = some_pd_series

sum(some_pd_series.isnull())
0

sum(df["Text"].isnull())
21137

df["Text"]应与some_pd_series完全相同，对吗？那么这些NaN突然来自哪里呢？

Answer 1

Thanks to @EdChum comment I found out the problem was caused by indices not matching. This happened because previously I had dropped duplicates from some_pd_series, which resulted in "holes" in its index.

Possible ways of solving this issue include:

some_pd_series.index = df.index
some_pd_series.reset_index(drop=True, inplace=True)

将Pandas Series作为列添加到DataFrame时会出现Wild NaN

1 个答案: