Question

如果我有一个DataFrame：

myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

提供以下数据框（从stackoverflow开始，并且对于DataFrame的图像没有足够的声誉）

   | A  | B  |

0  | 11 | 11 |

1  | 22 | 2A |

2  | 33 | 33 |

如果我想将B列转换为int值并删除无法转换的值，我必须这样做：

def convertToInt(cell):
    try:
        return int(cell)
    except:
        return None
myDF['B'] = myDF['B'].apply(convertToInt)

如果我只这样做：

是myDF [＆＃39; B＆＃39]。应用（int）的

错误显然是：

C：\ WinPython-32位-2.7.5.3 \蟒-2.7.5 \ lib中\站点包\大熊猫\ lib.pyd   在pandas.lib.map_infer（pandas \ lib.c：42840）（）

ValueError：基数为10的int（）的无效文字：＆＃39; 2A＆＃39;

有没有办法向myDF添加例外处理[＆＃39; B＆＃39;]。apply（）

提前谢谢！

Answer 1

我有同样的问题，但是对于一个更普遍的情况，很难判断该函数是否会产生异常（即你无法用isdigit这样简单的方式明确地检查这个条件）。

在考虑了一段时间之后，我想出了将try/except语法嵌入到单独函数中的解决方案。我发布了一个玩具示例，以防万一。“

import pandas as pd
import numpy as np

x=pd.DataFrame(np.array([['a','a'], [1,2]]))

def augment(x):
    try:
        return int(x)+1
    except:
        return 'error:' + str(x)

x[0].apply(lambda x: augment(x))

Answer 2

要做得更好/更快：

In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]: 
    A   B
0  11  11
1  22 NaN
2  33  33

[3 rows x 2 columns]

In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]: 
A      int64
B    float64
dtype: object

这是一种做到这一点的矢量化方法。 coerce标记表示将nan标记为无法转换为数字的任何内容。

如果您愿意，您当然可以将其添加到单个列中。

Answer 3

使用lambda：

实现这一目标的方法

myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)

您的意见：

>>> myDF
    A   B
0  11  11
1  22  2A
2  33  33

[3 rows x 2 columns]

>>> myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
0    11
1   NaN
2    33
Name: B, dtype: float64

Pandas .apply（）函数中的异常处理

3 个答案: