将Pandas Series作为列添加到DataFrame时会出现Wild NaN

时间:2016-10-20 12:14:14

标签: python pandas dataframe nan

我挣扎着一个我无法理解的奇怪的虫子。也许这是我忽视的一些非常基本的东西。代码如下:

df = pd.DataFrame(
    some_numpy_array, 
    columns=[i for i in range(N)])

df.shape
(57058, 20)

some_pd_series.shape
(57058,)

df["Text"] = some_pd_series

sum(some_pd_series.isnull())
0

sum(df["Text"].isnull())
21137

df["Text"]应与some_pd_series完全相同,对吗?那么这些NaN突然来自哪里呢?

1 个答案:

答案 0 :(得分:1)

Thanks to @EdChum comment I found out the problem was caused by indices not matching. This happened because previously I had dropped duplicates from some_pd_series, which resulted in "holes" in its index.

Possible ways of solving this issue include:

  1. some_pd_series.index = df.index
  2. some_pd_series.reset_index(drop=True, inplace=True)