删除Sub Strings pandas,python

时间:2017-06-06 01:40:09

标签: python regex pandas

我有一所高中df。我试图删除学校名称的通用结尾。

in[1]:df
out[2]:
     time    school
1    09:00   Brown Academy
2    10:00   Covfefe High School
3    11:00   Bradley High
4    12:00   Johnson Prep

school_endings = ['Academy','Prep,'High','High School']

所需:

out[3]:
     time    school
1    09:00   Brown
2    10:00   Covfefe
3    11:00   Bradley
4    12:00   Johnson

4 个答案:

答案 0 :(得分:4)

使用拆分

df.school = df.school.str.split(' ').str[0]

    school  time
0   Brown   09:00
1   Covfefe 10:00
2   Bradley 11:00
3   Johnson 12:00

答案 1 :(得分:2)

endings = ['Academy', 'Prep', 'High', 'High School']

endings = sorted(endings, key=len, reverse=True)

df.assign(school=df.school.replace(endings, '', regex=True).str.strip())

    time   school
1  09:00    Brown
2  10:00  Covfefe
3  11:00  Bradley
4  12:00  Johnson

答案 2 :(得分:0)

使用rstrip()方法从原始字符串的后面删除不需要的字符串。 e.g:

mystring = "Brown Academy"

mystring.rstrip("Academy") - >会给你o / p:'布朗'

答案 3 :(得分:0)

我可能会使用正则表达式替换:

class Auction
  def self.by_invitee(user)
    name = user.name.downcase
    json = { name => user.email } # note: you should be downcasing emails anyways
    where('invitee ? :name', name: name).or(
      where('invitee @> :json', json: json)
    )
  end
end