Question

我有一个如下的数据框

    a   b
0   1   26190
1   5   python
2   5   580

我想让列b仅托管整数，但正如您所看到的那样python不是int convertible，所以我想删除索引1处的行。我的预期出局必须像

    a   b
0   1   26190
1   5   580

如何在python中使用pandas过滤和删除？

Answer 1

您可以将to_numeric与notnull一起使用，并按boolean indexing进行过滤：

print (pd.to_numeric(df.b, errors='coerce'))
0    26190.0
1        NaN
2      580.0
Name: b, dtype: float64

print (pd.to_numeric(df.b, errors='coerce').notnull())
0     True
1    False
2     True
Name: b, dtype: bool

df = df[pd.to_numeric(df.b, errors='coerce').notnull()]
print (df)

   a      b
0  1  26190
2  5    580

Boud评论的另一种解决方案 - to_numeric使用dropna，最后astype转换为int：

df.b = pd.to_numeric(df.b, errors='coerce')
df = df.dropna(subset=['b'])
df.b = df.b. astype(int)
print (df)
   a      b
0  1  26190
2  5    580

如果需要使用isnull检查包含错误数据的所有行，请在应用函数to_numeric后过滤所有数据获取NaN：

print (pd.to_numeric(df.b, errors='coerce').isnull())
0    False
1     True
2    False
Name: b, dtype: bool

print (df[pd.to_numeric(df.b, errors='coerce').isnull()])
   a       b
1  5  python

Answer 2

这应该有效

import pandas as pd
import numpy as np

df = pd.DataFrame({'a' : [1, 5, 5],
                   'b' : [26190, 'python', 580]})
df
   a       b
0  1   26190
1  5  python
2  5     580

df['b'] = np.where(df.b.str.contains('[a-z]') == True, np.NaN, df.b)
df
   a      b
0  1  26190
1  5    NaN
2  5    580

df = df.dropna()
df
   a      b
0  1  26190
2  5    580

您使用正则表达式识别字符串，然后使用np.NaN将其转换为np.where，然后使用df.dropna()将其从df中删除。

将字符串列转换为整数

2 个答案: