我正在尝试在Python 2.7中查找并替换某个字符串。这是我的字符串(显示为原始):
\n\n\nTOSS UP\n\n\n\n1. MATH Short Answer Pablo walks 4 miles north, 6 miles east, and then 2 miles north again. In simplest form, how many miles is he from his starting point?\n\n\n\nANSWER: 6\n\n\n\nBONUS\n\n\n\n1. MATH Short Answer Evaluate the limit as x approaches infinity of x times the quantity negative 1 plus e to the 1 over x.\n\n\n\nANSWER: 1\n\n\n\nTOSS UP\n\n\n\n2. CHEMISTRY Multiple Choice Which of the following is NOT a characteristic of amines?\n\n\n\nW) A fully protonated amine is called an ammonium ion\n\nX) Amines can function as Br\xc3\xb8nsted bases\n\nY) The VSEPR geometry of the nitrogen atom is trigonal planar\n\nZ) Amines can be a hydrogen bond acceptor\n\n\n\nANSWER: Y) The VSEPR geometry of the nitrogen atom is trigonal planar\n\n\n\nBONUS\n\n\n\n2. CHEMISTRY Multiple Choice Of the following elements in their monatomic gaseous states, which has the lowest electron affinity?\n\n\n\nW) BoronX) CarbonY) NitrogenZ) OxygenANSWER: Y) NITROGEN\n\n\n
我正在使用此正则表达式进行搜索,然后进行一些替换:
searchString = (
r"(TOSS\-UP|TOSSUP|TOSS\s*UP)\s*"
r"(?P<questionNum>\d{1,2})[\.\)]\s*(?P<category>[A-Z ]+)\s*"
r"(?i)(Short Answer|Multiple Choice)\s*(?P<tossupQ>[\S\s]*)"
r"ANSWER\:\s*(?P<tossupA>[\S\s]*)"
r"\s*BONUS\s*"
r"(?P<questionNumBonus>\d{1,2})[\.\)]\s*(?P<categoryBonus>[A-Z ]+)\s*"
r"(?i)(Short Answer|Multiple Choice)\s*(?P<bonusQ>[\S\s]*)"
r"ANSWER\:(?P<bonusA>[\S\s]*)"
)
我得到的结果是:
{
"category": 4,
"questionNum": 1,
"tossupQ": "Pablo walks 4 miles north, 6 miles east, and then 2 miles north again. In simplest form, how many miles is he from his starting point?\n\n\n\nANSWER: 6\n\n\n\nBONUS\n\n\n\n1. MATH Short Answer Evaluate the limit as x approaches infinity of x times the quantity negative 1 plus e to the 1 over x.\n\n\n\nANSWER: 1\n\n\n\nTOSS UP\n\n\n\n2. CHEMISTRY Multiple Choice Which of the following is NOT a characteristic of amines?\n\n\n\nW) A fully protonated amine is called an ammonium ion\n\nX) Amines can function as Br\xc3\xb8nsted bases\n\nY) The VSEPR geometry of the nitrogen atom is trigonal planar\n\nZ) Amines can be a hydrogen bond acceptor",
"tossupA": "Y) The VSEPR geometry of the nitrogen atom is trigonal planar",
"bonusQ": "Of the following elements in their monatomic gaseous states, which has the lowest electron affinity?\n\n\n\nW) BoronX) CarbonY) NitrogenZ) Oxygen",
"bonusA": "Y) NITROGEN"
},
但是,当我将行r"ANSWER\:\s*(?P<tossupA>[\S\s]*)"
更改为r"ANSWER\:\s*(?P<tossupA>[\d]*)"
时,我明白这一点:
{
"category": 4,
"questionNum": 1,
"tossupQ": "Pablo walks 4 miles north, 6 miles east, and then 2 miles north again. In simplest form, how many miles is he from his starting point?",
"tossupA": "6",
"bonusQ": "Evaluate the limit as x approaches infinity of x times the quantity negative 1 plus e to the 1 over x.\n\n\n\nANSWER: 1\n\n\n\nTOSS UP\n\n\n\n2. CHEMISTRY Multiple Choice Which of the following is NOT a characteristic of amines?\n\n\n\nW) A fully protonated amine is called an ammonium ion\n\nX) Amines can function as Br\xc3\xb8nsted bases\n\nY) The VSEPR geometry of the nitrogen atom is trigonal planar\n\nZ) Amines can be a hydrogen bond acceptor\n\n\n\nANSWER: Y) The VSEPR geometry of the nitrogen atom is trigonal planar\n\n\n\nBONUS\n\n\n\n2. CHEMISTRY Multiple Choice Of the following elements in their monatomic gaseous states, which has the lowest electron affinity?\n\n\n\nW) BoronX) CarbonY) NitrogenZ) Oxygen",
"bonusA": "Y) NITROGEN"
},
为什么 tossup与[\ S \ s] *不匹配,但只与\ d *匹配?任何帮助将不胜感激!
答案 0 :(得分:1)
原因是你正在使用贪婪的量词。如果您不限制Answer:
后跟数字,则允许tossupQ
匹配较长的字符串。因此,tossupQ
包含所有问题和答案,直到最后Answer:
。
当您要求Answer:
后跟数字时,tossupA
只能匹配第一个答案,并且tossupQ
必须提前停止以允许此匹配。
您可以通过更改为非贪婪量词来解决此问题:*?
。这将使它们匹配与模式的其余部分一致的最短字符串,而不是最长的字符串。
searchString = (
r"(TOSS\-UP|TOSSUP|TOSS\s*UP)\s*"
r"(?P<questionNum>\d{1,2})[\.\)]\s*(?P<category>[A-Z ]+)\s*"
r"(?i)(Short Answer|Multiple Choice)\s*(?P<tossupQ>[\S\s]*?)"
r"ANSWER\:\s*(?P<tossupA>[\S\s]*?)"
r"\s*BONUS\s*"
r"(?P<questionNumBonus>\d{1,2})[\.\)]\s*(?P<categoryBonus>[A-Z ]+)\s*"
r"(?i)(Short Answer|Multiple Choice)\s*(?P<bonusQ>[\S\s]*?)"
r"ANSWER\:(?P<bonusA>[\S\s]*)"
)
BTW,[\S\s]
与.
相同。如果您希望匹配跨越多行,请使用re.DOTALL
标记以使其与换行符匹配。