Question

分割像这样的字符串的最佳方法是什么

text = "hello there how are you"

在Python中？

所以我最终得到一个像这样的数组：

['hello there', 'there how', 'how are', 'are you']

我已经尝试过了：

liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
    print(a[0])

但是我得到了：

hello there 
how are 
you

搜索时如何使findall函数仅移动一个令牌？

Answer 1

这是re.findall的解决方案：

>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']

看看re：https://docs.python.org/3/library/re.html

的Python文档

(?=...)前瞻断言
(?:...)不包含常规括号

Answer 2

如果不需要正则表达式，则可以执行以下操作：

l = text.split(' ')
out = []
for i in range(len(l)):
    try:
        o.append(l[i] + ' ' + l[i+1])
    except IndexError:
        continue

说明：

首先在空格字符上分割字符串。结果将是一个列表，其中每个元素都是句子中的一个单词。实例化一个空列表以保存结果。循环遍历单词列表，将由空格分隔的两个单词组合添加到输出列表中。访问列表中的最后一个单词时，这将引发IndexError，只是捕获它并继续，因为您似乎根本不想在结果中使用该单词。

Answer 3

我认为您实际上不需要正则表达式。
我了解您想要一个列表，其中每个元素包含两个单词，后者也是以下元素的前者。我们可以像这样轻松地做到这一点：

string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
    liste[i] = liste[i] + " " + liste[i+1]

Answer 4

我不知道您是否需要使用正则表达式，但是我会这样做。

首先，您可以使用maxLabelWidth方法获取单词列表。

str.split()

然后，您可以配对。

>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']

Answer 5

另一种选择是split，zip，然后join，就像这样……

sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]

Answer 6

使用findall的另一种可能的解决方案。

>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']

通过在python中使用正则表达式拆分字符串

6 个答案: