我正在关注一个pandas教程,该教程通过将字典传递给series.map方法来显示替换列中的值。这是教程中的一个片段:
然而,当我尝试这个时:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
NaN:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
我得到NameError: name 'NaN' is not defined
那么我做错了什么?
我正在尝试用False替换NaNs,用True替换非NaNs。
编辑2:以下是将NaN
更改为np.NaN
后我仍然面临的问题的图片:
然后,如果我重新运行映射单元并再次显示输出,那么所有False和NaN值都会触发。
答案 0 :(得分:3)
很简单,Python没有内置的NaN
名称。但是,NumPy会这样做,因此你可以使你的映射不会抛出带有np.nan
的错误。 Jon指出,math.nan
等于float('nan')
。
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
np.nan:False
}
不要停在这里,因为那不会起作用。
另一个棘手的问题是nan
在技术上并不等于任何,所以在这样的映射中使用它将无效。
>>> np.nan == np.nan
False
因此,无论如何,np.nan
都不会将您的DataFrame中的NaN值作为关键字选取,并保持NaN。有关此问题的进一步说明,请参阅NaNs as key in dictionaries。此外,我打赌您的nan
值实际上是字符串nan
。
最小化演示
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> for c in df.columns:
df[c] = df[c].map(answers)
>>> df
0 1
0 True NaN
1 True NaN
2 True True
# notice we're still stuck with NaN, as our nan strings weren't picked up
话虽如此,这似乎不适合用于字典或地图 - 您可以在一组中定义星球大战字符串,然后在您感兴趣的整列列上使用isin
answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
starwars.iloc[:, 3:9].isin(answers)
最小化演示
>>> answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> df.isin(answers)
0 1
0 True False
1 True False
2 True True
答案 1 :(得分:-1)
所以我对其他解决方案的问题是,由于它的工作方式,代码在第一次运行后不会以相同的方式运行。我在Jupyter笔记本上工作,所以我想要一些可以多次运行的东西。我只是一个Python初学者,但以下代码似乎能够运行多次,并且只在第一次运行时更改值:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
True:True,
False:False,
np.nan:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)