Question

我正在使用DataFrame构造函数创建一个pandas DataFrame对象。我的数据是列表和分类数据系列对象的字典。当我将索引传递给构造函数时，我的分类数据系列将使用NaN值重置。这里发生了什么？提前谢谢！

示例：

import pandas as pd
import numpy as np
a = pd.Series(['a','b','c'],dtype="category")
b = pd.Series(['a','b','c'],dtype="object")
c = pd.Series(['a','b','cc'],dtype="object")

A = pd.DataFrame({'A':a,'B':[1,2,3]},index=["0","1","2"])
AA = pd.DataFrame({'A':a,'B':[1,2,3]})
B = pd.DataFrame({'A':b,'C':[4,5,6]})    

print("DF A:")
print(A)
print("\nDF A, without specifying an index in the constructor:")
print(AA)
print("\nDF B:")
print(B)

Answer 1

这与类别与对象无关，它与索引对齐有关。

你在A中获得NaN，因为你告诉构造函数你想要一个三个字符串的索引。但是a有一个自己的索引，由整数[0, 1, 2]组成。由于这与您所要求的索引不匹配，因此数据无法对齐，因此您可以使用您想要的索引获取DataFrame，并且NaN会突出显示数据丢失。相比之下，B只是一个列表，因此没有要忽略的索引，因此它假设数据以索引适当的顺序给出。

这可能比解释更容易看到。无论dtype如何，如果指数不匹配，你得到NaN：

In [147]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="category"),'B':[1,2,3]},
          index=["0","1","2"])
Out[147]: 
     A  B
0  NaN  1
1  NaN  2
2  NaN  3

In [148]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
          index=["0","1","2"])
Out[148]: 
     A  B
0  NaN  1
1  NaN  2
2  NaN  3

如果使用完全匹配的索引，则可以使用：

In [149]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
          index=[0,1,2])
Out[149]: 
   A  B
0  a  1
1  b  2
2  c  3

如果您使用部分匹配的索引，您将获得索引对齐的值和NaN所在的值：

In [150]: pd.DataFrame({'A':pd.Series(list("abc"), dtype="object"),'B':[1,2,3]},
          index=[0,1,10])
Out[150]: 
      A  B
0     a  1
1     b  2
10  NaN  3

Pandas DataFrame构造函数在包含index参数时引入了NaN

1 个答案: