根据pandas数据框中各列中的值过滤数据

时间:2018-09-22 23:17:20

标签: python pandas data-analysis

我最近一直在处理一些数据。在过滤过程中,我发现某些列存在问题。我只希望在“分支”列的最后保留那些带有')'的行。

我尝试了几种选择,但是我想找到最快的解决方法。

This is a part of the data on which i have been working on.

1 个答案:

答案 0 :(得分:0)

由于您没有以文本形式提供数据,因此我创建了一个示例数据框:

输入:

d = {'college_name': ['College {}'.format(i+1) for i in range(8)], 'branch': ['Civil Enigineering '+ '(4 Years)'*(i%2) for i in range(8)]}
df = pd.DataFrame(data=d, columns=['college_name','branch'])
df

输出:

    college_name    branch
0   College 1   Civil Enigineering
1   College 2   Civil Enigineering (4 Years)
2   College 3   Civil Enigineering
3   College 4   Civil Enigineering (4 Years)
4   College 5   Civil Enigineering
5   College 6   Civil Enigineering (4 Years)
6   College 7   Civil Enigineering
7   College 8   Civil Enigineering (4 Years)

Pandas系列具有内置的字符串处理方法。您可以使用str.endswith(')')过滤数据。请注意,df['branch'].str.endswith(')')将返回一个布尔掩码。

输入:

df[df['branch'].str.endswith(')')]

输出:

    college_name    branch
1   College 2   Civil Enigineering (4 Years)
3   College 4   Civil Enigineering (4 Years)
5   College 6   Civil Enigineering (4 Years)
7   College 8   Civil Enigineering (4 Years)