如何从包含特定列中的任何字符串的Pandas数据框中删除行

时间:2017-09-16 18:23:07

标签: python python-3.x pandas numpy machine-learning

我有以下格式的CSV数据:

+-------------+-------------+-------+
|  Location   | Num of Reps | Sales |
+-------------+-------------+-------+
| 75894       |           3 |    12 |
| Burkbank    |           2 |    19 |
| 75286       |           7 |    24 |
| Carson City |           4 |    13 |
| 27659       |           3 |    17 |
+-------------+-------------+-------+

Location列属于object数据类型。我想要做的是删除所有具有非数字位置标签的行。所以我想要的输出,如上表所示:

+----------+-------------+-------+
| Location | Num of Reps | Sales |
+----------+-------------+-------+
|    75894 |           3 |    12 |
|    75286 |           7 |    24 |
|    27659 |           3 |    17 |
+----------+-------------+-------+

现在,我可以通过以下方式对解决方案进行硬编码:

list1 = ['Carson City ', 'Burbank'];
df = df[~df['Location'].isin(['list1'])]

受到以下帖子的启发:

How to drop rows from pandas data frame that contains a particular string in a particular column?

但是,我正在寻找的是一般解决方案,适用于上述类型的任何表格。

4 个答案:

答案 0 :(得分:5)

或者你可以做到

df[df['Location'].str.isnumeric()]

  Location  Num of Reps  Sales
0    75894            3     12
2    75286            7     24
4    27659            3     17

答案 1 :(得分:3)

您可以pd.to_numeric 强制将非数字值强加给nan,然后根据位置nan进行过滤:

df[pd.to_numeric(df.Location, errors='coerce').notnull()]

#Location  Num of Reps  Sales
#0  75894            3     12
#2  75286            7     24
#4  27659            3     17

答案 2 :(得分:1)

In [139]: df[~df.Location.str.contains('\D')]
Out[139]:
  Location  Num of Reps  Sales
0    75894            3     12
2    75286            7     24
4    27659            3     17

答案 3 :(得分:0)

df[df['Location'].str.isdigit()]


  Location  Num of Reps  Sales
0    75894            3     12
2    75286            7     24
4    27659            3     17