使用正则表达式从字符串中删除非字母数字字符有几个问题。我想要做的是删除第一个不是字母或单个空格的字符(包括数字和双空格)后的每个字符,包括字母。
例如:
My string is #not very beautiful
应该成为
My string is
或
Are you 9 years old?
应该成为
Are you
和
this is the last example
应该成为
this is the last
我如何做到这一点?
答案 0 :(得分:5)
split
上的[^A-Za-z ]|
和第一个元素怎么样?您可以稍后修剪可能的空格:
import re
re.split("[^A-Za-z ]| ", "My string is #not very beautiful")[0].strip()
# 'My string is'
re.split("[^A-Za-z ]| ", "this is the last example")[0].strip()
# 'this is the last'
re.split("[^A-Za-z ]| ", "Are you 9 years old?")[0].strip()
# 'Are you'
[^A-Za-z ]|
包含两种模式,第一种模式是单个字符,既不是字母也不是空格;第二种模式是双白空间;拆分这两种模式中的一种,拆分后的第一个元素应该是你要找的。 p>
答案 1 :(得分:1)
创建白名单并在看到不在该白名单中的内容时停止:
import itertools
import string
def rstrip(s, whitelist=None):
if whitelist is None:
whitelist = set(string.ascii_letters + ' ') # set the whitelist to a default of all letters A-Z and a-z and a space
# split on double-whitespace and take the first split (this will work even if there's no double-whitespace in the string)
# use `itertools.takewhile` to include the characters that in the whitelist
# use `join` to join them inot one single string
return ''.join(itertools.takewhile(whitelist.__contains__, s.split(' ', 1)[0]))
答案 2 :(得分:1)
import re
str1 = "this is the last example"
regex = re.compile(r"(([a-zA-Z]|(\s[a-zA-Z]))+)")
capture = re.match(regex, str1)
res = capture.group(1)
我也用你的其他例子测试了它,它似乎给出了正确的结果。请注意,这不会保留尾随空格,这就是您的示例所显示的内容,即使这不是您想要的内容。
答案 3 :(得分:0)
强制性表达
def truncate_nonalpha_space(s):
return s[:next((x for x, a in enumerate(s.split(" ")[0]) if not a.isalpha() and not a == " "), len(s))].rstrip()
步骤:
形成一个表达式,以通过.isalpha()
方法获取不是字母的值的索引或等于" "
" "
上s分割的左侧用于在弹出表达式时处置任何双空白实例
枚举此剩余部分以获取字符串的列表索引(现在它本身就是一个列表)
这些值中的第一个用于对s进行切片,否则将返回所有s s[:len(s)]
去除右空白.rstrip()
答案 4 :(得分:0)
答案 5 :(得分:-1)
Hacky,但使用 yield :
import string
li_test = [
("My string is #not very beautiful","My string is"),
("Are you 9 years old?","Are you "),
("this is the last example","this is the last "),
]
tolerated = string.ascii_letters
def rstrip_(s_in):
last = None
for char in s_in:
if char in tolerated:
last = char
yield char
elif char == ' ':
if last == ' ':
raise StopIteration()
last = char
yield char
else:
raise StopIteration()
for input_, exp in li_test:
got = "".join(rstrip_(input_))
msg = ":%s:<>:%s:" % (exp, got)
print (":%s:=>:%s:" % (input_, got))
#cheating a bit because I dunno if the last space is wanted.
assert exp.rstrip() == got.rstrip(), msg
输出:
:My string is #not very beautiful:=>:My string is :
:Are you 9 years old?:=>:Are you :
:this is the last example:=>:this is the last :
而且,是的,我应该将整个事情包装在第二个函数中并加入那里的角色......