用熊猫系列替换NaN.map(dict)

时间:2017-03-10 16:49:09

标签: python pandas dictionary dataframe nan

我正在关注一个pandas教程,该教程通过将字典传递给series.map方法来显示替换列中的值。这是教程中的一个片段: enter image description here

然而,当我尝试这个时:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers) 

我得到NameError: name 'NaN' is not defined

那么我做错了什么?

修改:为了更好地解释我的目标,我的列看起来像这样:enter image description here

我正在尝试用False替换NaNs,用True替换非NaNs。

编辑2:以下是将NaN更改为np.NaN后我仍然面临的问题的图片:

enter image description here

然后,如果我重新运行映射单元并再次显示输出,那么所有False和NaN值都会触发。

2 个答案:

答案 0 :(得分:3)

很简单,Python没有内置的NaN名称。但是,NumPy会这样做,因此你可以使你的映射不会抛出带有np.nan错误。 Jon指出,math.nan等于float('nan')

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }

不要停在这里,因为那不会起作用。 另一个棘手的问题是nan在技术上并不等于任何,所以在这样的映射中使用它将无效。

>>> np.nan == np.nan 
False

因此,无论如何,np.nan都不会将您的DataFrame中的NaN值作为关键字选取,并保持NaN。有关此问题的进一步说明,请参阅NaNs as key in dictionaries。此外,我打赌您的nan值实际上是字符串nan

最小化演示

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up

更好的解决方案

话虽如此,这似乎不适合用于字典或地图 - 您可以在一组中定义星球大战字符串,然后在您感兴趣的整列列上使用isin

answers = {
        "Star Wars: Episode I  The Phantom Menace",
        "Star Wars: Episode II  Attack of the Clones" 
        "Star Wars: Episode III  Revenge of the Sith",
        "Star Wars: Episode IV  A New Hope",
        "Star Wars: Episode V  The Empire Strikes Back",
        "Star Wars: Episode VI  Return of the Jedi",
        }

starwars.iloc[:, 3:9].isin(answers) 

最小化演示

>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True

答案 1 :(得分:-1)

所以我对其他解决方案的问题是,由于它的工作方式,代码在第一次运行后不会以相同的方式运行。我在Jupyter笔记本上工作,所以我想要一些可以多次运行的东西。我只是一个Python初学者,但以下代码似乎能够运行多次,并且只在第一次运行时更改值:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True,
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V The Empire Strikes Back":True,
        "Star Wars: Episode VI Return of the Jedi":True,
        True:True,
        False:False,
        np.nan:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)