我是正则表达式匹配的新手,我的字符串如下所示
"karthika has symptoms cold,cough her gender is female and his age is 45"
在第一个匹配的字符串中,我将检查关键字“症状”,然后选择关键字的下一个单词,如下所示:
regexp = re.compile("symptoms\s(\w+)")
symptoms = regexp.search(textoutput).group(1)
这将使症状值为“冷”,但是我在文本中存在多个症状,因此在第二步中,我需要在“冷”之后检入是否存在逗号(,),如果存在逗号表示我需要在正则表达式“咳嗽”后立即打印值。
请帮助我实现这一目标。
答案 0 :(得分:2)
您可以使用正则表达式来查找'symptoms'
之后的第一个单词,并可以选择以科马,马比空格和更多单词字符开头的更多匹配项:
import re
pattern = r"symptoms\s+(\w+)(?:,\s*(\w+))*"
regex = re.compile(pattern)
t = "kathy has symptoms cold,cough her gender is female. john's symptoms hunger, thirst."
symptoms = regex.findall(t)
print(symptoms)
输出:
[('cold', 'cough'), ('hunger', 'thirst')]
说明:
r"symptoms\s+(\w+)(?:,\s*(\w+))*"
# symptoms\s+ literal symptoms followed by 1+ whitepsaces
# (\w+) followed by 1+ word-chars (first symptom) as group 1
# (?:, )* non grouping optional matches of comma+spaces
# (\w+) 1+ word-chars (2nd,..,n-th symptom) as group 2-n
备用方式:
import re
pattern = r"symptoms\s+(\w+(?:,\s*\w+)*(?:\s+and\s+\w+)?)"
regex = re.compile(pattern)
t1 = "kathy has symptoms cold,cough,fever and noseitch her gender is female. "
t2 = "john's symptoms hunger, thirst."
symptoms = regex.findall(t1+t2)
print(symptoms)
输出:
['cold,cough,fever and noseitch', 'hunger, thirst']
这仅适用于“英国”英语-的
"kathy has symptoms cold,cough,fever, and noseitch"
只会导致cold,cough,fever, and
作为匹配项。
您可以在','
和" and "
拆分每个匹配项,以获取单个原因:
sym = [ inner.split(",") for inner in (x.replace(" and ",",") for x in symptoms)]
print(sym)
输出:
[['cold', 'cough', 'fever', 'noseitch'], ['hunger', ' thirst']]
答案 1 :(得分:1)
您可以使用正则表达式捕获组 例如,
# the following pattern looks for
# symptoms<many spaces><many word chars><comma><many word chars>
s_re = re.compile(r"symptoms\s+\w+,(\w+)")
完整代码是
import re
from typing import Optional
s_re = re.compile(r"symptoms\s+\w+,(\w+)")
def get_symptom(text: str) -> Optional[str]:
found = s_re.search(text)
if found:
return found.group(1)
return None