我有一个Python字符串,如下所示:
"5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk"
我需要为关键字cup
之前出现的每个数字添加1。
结果必须是:
"5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk"
我有以下几点:
import re
p = re.compile('([0-9]+) cup')
for i in p.finditer(s):
# do something with int(i.group(1)) + 1
我无法弄清楚如何只替换每次迭代中找到的数字。
我还有一个边缘情况,我可能需要用10代替9,所以我不能简单地得到数字的索引并用新数字替换该数字,因为新数字可能更长。
也欢迎不涉及正则表达式的解决方案。
答案 0 :(得分:3)
您可以将函数作为替换字符串传递给sub
函数。此函数接收match object作为参数。
处理收到的参数以为每个匹配创建替换字符串。
感谢@ctwheels的回答,我改进了我的初始正则表达式处理。
mystring = """
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
"""
p = r'\d+(?= +cups?\b)'
newstring = re.sub(p, lambda x: str(int(x.group(0))+1), mystring)
print(newstring)
# outputs:
5 pounds cauliflower,
cut into 1-inch florets (about 20 cups)
2 large leeks,
1 teaspoons salt
5 cups of milk
处理单词复数化(由@CasimiretHippolyte提出)我们可以使用更广泛的模式,但更多涉及更换函数:
def repl(x):
d = int(x.group(0).split()[0]) + 1
return str(d) + ' cup' if d == 1 else str(d) + ' cups'
p = r'\d+ cups?'
mystring = """
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
1 cup of butter
0 cups of sugar"""
newstring = re.sub(p, repl, mystring)
print(newstring)
# outputs
5 pounds cauliflower,
cut into 1-inch florets (about 20 cups)
2 large leeks,
1 teaspoons salt
5 cups of milk
2 cups of butter
1 cup of sugar
答案 1 :(得分:2)
也不是正则表达式:
def tryParseInt(i):
try:
num = int(i)
except:
return (False,i)
return (True,num)
txt = '''5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk'''
txt2 = txt.replace("\n"," \n ").split(" ") # add a space before newline to allow splitting
# at spaces to keep newlines in-lined
txt3 = "" # result
for n in range(len(txt2)-1):
prev, current = txt2[n:n+2]
if (current == "cup" or current == "cups" or current == "cups)"):
isint, n = tryParseInt(prev)
if isint:
prev = str(n+1)
txt3 = txt3.strip() + " " + prev
elif prev is not None:
txt3 = txt3 + " " + prev
txt3 += " " + current
print(txt3.replace(" \n ","\n"))
也不是正则表达式(这是第一次尝试):
txt = '''5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk'''
def intOrNot(a):
"""splits a at spaces and returns a list of strings and ints where possible"""
rv = []
for n in a.split():
try:
rv.append(int(n))
except:
rv.append(n)
return rv
p = [x for x in txt.split("\n")] # get rid on lines
t = [intOrNot(a) for a in p] # sublists per line
for q in t:
for idx in range(len(q)-1):
num,cup = q[idx:idx+2]
if isinstance(num,int) and "cup" in cup: # do not add buttercup to the recipe
q[idx]+=1 # add 1 to the number
text = ""
for o in t: # puzzle output together again
for i in o:
if isinstance(i,int):
text += " " + str(i)
else:
text += " " + i
text = text.strip() + "\n"
print (txt+"\n\n"+text)
输出:
5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
答案 2 :(得分:2)
您可以尝试这样的事情:
import re
pattern=r'cups?'
string_1="""5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk"""
jk=string_1.splitlines()
for i in jk:
wow=i.split()
for l,k in enumerate(wow):
if (re.search(pattern,k))!=None:
wow[l-1]=int(wow[l-1])+1
print(" ".join([str(i) for i in wow]))
输出:
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
答案 3 :(得分:1)
\d+(?= +cups?\b)
import re
a = [
"5 pounds cauliflower,",
"cut into 1-inch florets (about 18 cups)",
"2 large leeks,",
"1 teaspoons salt",
"3 cups of milk"
]
r = r"\d+(?= +cups?\b)"
def repl(m):
return str(int(m.group(0)) + 1)
for s in a:
print re.sub(r, repl, s)
此代码是对问题
下面的@CasimiretHippolyte评论的回应import re
a = [
"5 pounds cauliflower,",
"cut into 1-inch florets (about 18 cups)",
"2 large leeks,",
"1 teaspoons salt",
"3 cups of milk",
"0 cups of milk",
"1 cup of milk"
]
r = r"(\d+) +(cups?)\b"
def repl(m):
x = int(m.group(1)) + 1
return str(x) + " " + ("cup", "cups")[x > 1]
for s in a:
print re.sub(r, repl, s)
5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
\d+
匹配任何数字一次或多次(?= +cups?\b)
确定以下内容的积极前瞻
+
匹配一个或多个空格字符cups?
匹配cup
或cups
(s?
使s
可选)\b
断言位置为单词边界答案 4 :(得分:1)
您可以尝试这种单线解决方案:
import re
s = """
5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk
"""
new_s = re.sub('\d+(?=\s[a-zA-Z])', '{}', s).format(*[int(re.findall('^\d+', i)[0])+1 if re.findall('[a-zA-Z]+$', i)[0] == 'cups' else int(re.findall('^\d+', i)[0]) for i in re.findall('\d+\s[a-zA-Z]+', s)])
print(new_s)
输出:
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk