Question

我想使用正则表达式在Igbo文本中提取wor'word形式的单词（我真的不太了解正则表达式）。例如，

line = "jir’ọbara ya"

如果我这样做

found = re.match("\w+’\w+", line)
print found.group()

我得到'NoneType' object has no attribute 'group'而不是jir’ọbara

然后，如果我found = re.match("\w+’|\w+", line)，它只给我jir’。

有关如何解决此问题或其他最佳方法的建议吗？感谢。

Answer 1

如果该行的格式一致，则：

wor, word = line.split()[0].split("’")

或

>>> found = re.match("(\w+)’(\w+)", line)
>>> found.group(1)
'jir'
>>> found.group(2)
'ọbara'
>>>