我挣扎着一个我无法理解的奇怪的虫子。也许这是我忽视的一些非常基本的东西。代码如下:
df = pd.DataFrame(
some_numpy_array,
columns=[i for i in range(N)])
df.shape
(57058, 20)
some_pd_series.shape
(57058,)
df["Text"] = some_pd_series
sum(some_pd_series.isnull())
0
sum(df["Text"].isnull())
21137
df["Text"]
应与some_pd_series
完全相同,对吗?那么这些NaN
突然来自哪里呢?
答案 0 :(得分:1)
Thanks to @EdChum comment I found out the problem was caused by indices not matching. This happened because previously I had dropped duplicates from some_pd_series
, which resulted in "holes" in its index.
Possible ways of solving this issue include:
some_pd_series.index = df.index
some_pd_series.reset_index(drop=True, inplace=True)