Question

假设我的文字看起来像：

a =＆＃34;我倾向于提出简单的问题＆＃34;

我想首先提取带连字符的单词，即首先确定文本中是否存在连字符，这很容易。我使用re.match（＆＃34; \ s * - \ s *＆＃34;，a）来检查句子是否有连字符。

1）接下来我想提取前面和后面的部分单词（在这种情况下，我想提取＆＃34; inclin＆＃34;和＃34; ed＆＃34;）

2）接下来我想将它们合并到＆＃34;倾斜＆＃34;并打印所有这些单词。

我被困在第1步。请帮忙。

Answer 1

>>> import re
>>> a = "I am inclin- ed to ask simple questions"
>>> result = re.findall('([a-zA-Z]+-)\s+(\w+)', a)
>>> result
[('inclin-', 'ed')]

>>> [first.rstrip('-') + second for first, second in result]
['inclined']

或者，您可以让第一组保存单词而不使用尾随-：

>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> result
[('inclin', 'ed')]
>>> [''.join(item) for item in result]
['inclined']

这也适用于字符串中的多个匹配项：

>>> a = "I am inclin- ed to ask simp- le quest- ions"
>>> result = re.findall('([a-zA-Z]+)-\s+(\w+)', a)
>>> [''.join(item) for item in result]
['inclined', 'simple', 'questions']

Answer 2

试试这个正则表达式，它应该适合你：

a = "I am inclin- ed to ask simple questions"

try:
    m = re.search('\S*\-(.|\s)\S*', a) #this will get the whole word, i.e "inclin- ed"
except AttributeError:
    #not found in a

print m

然后你去掉你的字符串，并把它们作为一个数组抓住。

删除文本中的相关连字符

2 个答案: