在分隔符之前和之后查找单词

时间:2017-06-27 21:28:17

标签: python string delimiter

string = "The is a better :: sentence as :: compared to that" 

输出:

  1. 更好的句子
  2. 相比
  3. 我尝试了以下内容,

    string.split(" :: "), 
    re.sub("[\<].*?[\>]", "", string)
    

    这些不会给我特定的话

3 个答案:

答案 0 :(得分:3)

>>> string = "The is a better :: sentence as :: compared to that" 
>>> x = [' '.join(x) for x in map(lambda x: (x[0].split()[-1], x[1].split()[0]), zip(string.split('::')[:-1], string.split('::')[1:]))]
>>> x

输出:

['better sentence', 'as compared']

Disection:

首先,基于::和zip组连续匹配进行拆分

pairs = zip(string.split('::')[:-1], string.split('::')[1:]))

如果你list()那个表达式,你得到:

[('The is a better ', ' sentence as '), (' sentence as ', ' compared to that')]

接下来,应用一个函数从第一个元素中提取最后一个单词,从第二个元素中提取每个元组的第一个单词:

new_pairs = map(lambda x: (x[0].split()[-1], x[1].split()[0]), pairs)

如果你list()那个表达式,你得到:

[('better', 'sentence'), ('as', 'compared')]

最后,在列表解析中加入每个元组:

result = [' '.join(x) for x in new_pairs]

输出:

['better sentence', 'as compared']

timeit结果:

The slowest run took 4.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.74 µs per loop

这是re的另一种方式。

import re
string = "The is a better :: sentence as :: compared to that" 
result = [' '.join(x) for x in re.findall('([\w]+) :: ([\w]+)', string)]

输出:

['better sentence', 'as compared']

timeit结果:

The slowest run took 4.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.49 µs per loop

答案 1 :(得分:3)

使用re.findall()函数的解决方案:

s = "The is a better :: sentence as :: compared to that"
result = [' '.join(i) for i in re.findall(r'(\w+) ?:: ?(\w+)', s)]

print(result)

输出:

['better sentence', 'as compared']

答案 2 :(得分:1)

这是另一种方式:

1st)获取分隔符的索引

indices = [idx for idx, elem in enumerate(string.split(' ')) if elem == '::']

2)加入分隔符周围的单词

for idx in indices:
    print ' '.join(string.split(' ')[idx-1:idx+2:2])

'better sentence'
'as compared'