假设我有这样的文字:
Our favorite numbers are 5, 6, and 7, but his favorite number is 0. Also, this text contains 2 sentences.
让我们说我只想在本文中获得喜欢的数字,例如。除非短语favorite number
存在,否则我无法知道此文中是否有喜欢的号码。所以我基本上试图解析短语favorite number
(或favorite numbers
)周围的数字。预期结果应该是这样的:
['5', '6', '7', '0']
我尝试使用正则表达式,但到目前为止我已经失败了。最合乎逻辑的方法是什么?
编辑:在阅读@ LouiseDavies的question后,我将在下面添加另一个示例:
Alice has 2 favorite numbers: 11 and 12. Bob has 10 favorite numbers: 0, 100, 1264, 598, 78496, 33546, 1028896, 23, 48, 6.
所以在这个例子中,我的输出应该是这样的(顺序并不重要):
['11', '12', '0', '100', '1264', '598', '78496', '33546', '1028896', '23', '48', '6']
答案 0 :(得分:1)
您没有显示任何代码,因此我不会写出完整的解决方案。
您可以在.
分割,过滤包含"favourite number"
的句子并从这些句子中提取数字。你不应该试图为整个句子写一个正则表达式。
这是一个开始:
text = "Our favorite numbers are 5, 6, and 7, but his favorite number is 0. Also, this text contains 2 sentences."
import re
pattern = re.compile("favou?rite numbers?", re.I)
print([sentence for sentence in text.split('.') if pattern.search(sentence)])
# ['Our favorite numbers are 5, 6, and 7, but his favorite number is 0']
既然您已经拥有了有趣的句子列表,那么您就可以从一个完整的解决方案中获得一个列表理解和一个re.findall('d+')
。
答案 1 :(得分:0)
您可以使用正则表达式:
import re
import itertools
s = 'Our favorite numbers are 5, 6, and 7, but his favorite number is 0. Also, this text contains 2 sentences.'
numbers = re.findall('(?<= favorite number is)[,\s\d]+|(?<=favorite numbers are)[,\s\dand]+', s)
final_numbers = list(itertools.chain(*[re.findall('\d+', i) for i in numbers]))
输出:
['5', '6', '7', '0']
答案 2 :(得分:0)
我在手机中,所以无法检查我的代码,在我的家里我会检查它。
text = "Our favorite numbers are 5, 6, and 7, but his favorite number is 0. Also, this text contains 2 sentences."
sentences = text.split('.')
numbers = set()
for sentence in sentences:
if "favorite number" in sentence:
numbers = numbers.union(set(sentence))
numbers = list(numbers.difference(set([*[chr(n) for n in range(32,48)],*[chr(n) for n in range(58,168)]])))
numbers = [int(x) for x in numbers]
print(numbers)
另一种方式可能是:
text = "Our favorite numbers are 5, 6, and 7, but his favorite number is 0. Also, this text contains 2 sentences."
sentences = text.split('.')
numbers = []
for sentence in sentences:
if "favorite number" in sentence:
for character in sentence:
try:
number.append(int(character))
except ValueError:
pass
print(numbers)
使用timeit.timeit
并检查函数100000
次(没有print()
),第一种方式为3.614777436797135
,第二种方式为12.934136042429973
。所以第一个不是完全有序的,但它的3.57
次要快一些。
答案 3 :(得分:0)
这比其他答案要长一些,但如果您的需求发生变化,状态机方法可能会变得更加可维护。
import re
text = """
Our favorite numbers are 5, 6, and 7, but his favorite number is 0. Also, this text contains 2 sentences.
"""
r = re.compile(r"(\d+|[a-zA-Z ]+)")
faves = False
lst = []
while True:
s = r.search(text)
if s is None:
break
x = s.group(1).strip()
if x:
if x == 'and':
pass
elif re.search(r'favou?rite numbers?', x):
faves = True
elif re.match(r"^\d+$", x) and faves:
lst.append(x)
else:
faves = False
text = text[s.end():]
print lst