如果数据框中的单元格值包含少于5个字符,则Python删除行

时间:2019-03-20 12:47:57

标签: python python-3.x

我有一个数据框,就像我要保留的行数超过5个字符一样。这是我尝试过的方法,但是它删除了“ of”,“ U。”,“ and”,“ Arts”等。我只需要删除len小于5的行中的字符。

id schools
1  University of Hawaii
2  Dept in Colorado U.
3  Dept
4  College of Arts and Science
5  Dept
6  Bldg

我的代码输出错误:

0    University Hawaii
1             Colorado
2                     
3      College Science
4                     
5   

寻找这样的输出:

id schools
1  University of Hawaii
2  Dept in Colorado U.
4  College of Arts and Science

代码:

l = [1,2,3,4,5,6]
s = ['University of Hawaii', 'Dept in Colorado U.','Dept','College of Arts and Science','Dept','Bldg']
df1 = pd.DataFrame({'id':l, 'schools':s})
df1 = df1['schools'].str.findall('\w{5,}').str.join(' ') # not working
df1

3 个答案:

答案 0 :(得分:2)

对于此任务,使用正则表达式是一个巨大(且缓慢)的过大杀伤力。您可以使用简单的熊猫索引:

filtrered_df = df1[df1['schools'].str.len() > 5]  # or >= depending on the required logic

答案 1 :(得分:0)

为您的数据提供一个更简单的过滤器。

 mask = df1['schools'].str.len() > 5

然后从过滤器创建一个新的数据框

df2 = df1[mask].copy()

答案 2 :(得分:-1)

import pandas as pd
name = ['University of Hawaii','Dept in Colorado U.','Dept','College of Arts and Science','Dept','Bldg']

labels =['schools']
df =pd.DataFrame.from_records([[i] for i in name],columns=labels)
df[df['schools'].str.len() >5 ]