删除python中字符串中不是字母的第一个字符后面的任何内容

时间:2017-01-06 01:10:16

标签: python regex string

使用正则表达式从字符串中删除非字母数字字符有几个问题。我想要做的是删除第一个不是字母或单个空格的字符(包括数字和双空格)后的每个字符,包括字母。

例如:

My string is #not very beautiful 

应该成为

My string is

Are you 9 years old?

应该成为

Are you

this is the last  example

应该成为

this is the last

我如何做到这一点?

6 个答案:

答案 0 :(得分:5)

split上的[^A-Za-z ]|和第一个元素怎么样?您可以稍后修剪可能的空格:

import re
re.split("[^A-Za-z ]|  ", "My string is #not very beautiful")[0].strip()
# 'My string is'

re.split("[^A-Za-z ]|  ", "this is the last  example")[0].strip()
# 'this is the last'

re.split("[^A-Za-z ]|  ", "Are you 9 years old?")[0].strip()
# 'Are you'

[^A-Za-z ]|包含两种模式,第一种模式是单个字符,既不是字母也不是空格;第二种模式是双白空间;拆分这两种模式中的一种,拆分后的第一个元素应该是你要找的。

答案 1 :(得分:1)

创建白名单并在看到不在该白名单中的内容时停止:

import itertools
import string

def rstrip(s, whitelist=None):
    if whitelist is None:
        whitelist = set(string.ascii_letters + ' ')  # set the whitelist to a default of all letters A-Z and a-z and a space
    # split on double-whitespace and take the first split (this will work even if there's no double-whitespace in the string)
    # use `itertools.takewhile` to include the characters that in the whitelist
    # use `join` to join them inot one single string

    return ''.join(itertools.takewhile(whitelist.__contains__, s.split('  ', 1)[0]))

答案 2 :(得分:1)

import re
str1 = "this is the last  example"
regex = re.compile(r"(([a-zA-Z]|(\s[a-zA-Z]))+)")
capture = re.match(regex, str1)
res = capture.group(1)

我也用你的其他例子测试了它,它似乎给出了正确的结果。请注意,这不会保留尾随空格,这就是您的示例所显示的内容,即使这不是您想要的内容。

答案 3 :(得分:0)

强制性表达

def truncate_nonalpha_space(s):
    return s[:next((x for x, a in enumerate(s.split("  ")[0]) if not a.isalpha() and not a == " "), len(s))].rstrip()

步骤:

  1. 形成一个表达式,以通过.isalpha()方法获取不是字母的值的索引或等于" "

  2. " "上s分割的左侧用于在弹出表达式时处置任何双空白实例

  3. 枚举此剩余部分以获取字符串的列表索引(现在它本身就是一个列表)

  4. 这些值中的第一个用于对s进行切片,否则将返回所有s s[:len(s)]去除右空白.rstrip()

答案 4 :(得分:0)

^.+?(?=[^A-Za-z ]|$|\s{2})

您可以使用此方法获取输出。使用re.findall获取输出。

参见演示。

https://regex101.com/r/INzotJ/1

答案 5 :(得分:-1)

Hacky,但使用 yield

import string

li_test = [
    ("My string is #not very beautiful","My string is"),
    ("Are you 9 years old?","Are you "),
    ("this is the last  example","this is the last "),
]

tolerated = string.ascii_letters

def rstrip_(s_in):
    last = None
    for char in s_in:
        if char in tolerated:
            last = char
            yield char
        elif char == ' ':
            if last == ' ':
                raise StopIteration()
            last = char
            yield char
        else:                    
            raise StopIteration()

for input_, exp in li_test:
    got = "".join(rstrip_(input_))
    msg = ":%s:<>:%s:" % (exp, got)
    print (":%s:=>:%s:" % (input_, got))
    #cheating a bit because I dunno if the last space is wanted.
    assert exp.rstrip() == got.rstrip(), msg

输出:

 :My string is #not very beautiful:=>:My string is :
 :Are you 9 years old?:=>:Are you :
 :this is the last  example:=>:this is the last :

而且,是的,我应该将整个事情包装在第二个函数中并加入那里的角色......