我试图创建一个多维数组,其中包含字符串中的单词 - 该单词之前的单词(除非在字符串的开头,空白),单词和后面的单词(除非在字符串的结尾,空白)
我尝试过以下代码:
def parse_group_words(text):
groups = []
words = re_sub("[^\w]", " ", text).split()
number_words = len(words)
for i in xrange(number_words):
print i
if i == 0:
groups[i][0] = ""
groups[i][1] = words[i]
groups[i][2] = words[i+1]
if i > 0 and i != number_words:
groups[i][0] = words[i-1]
groups[i][1] = words[i]
groups[i][2] = words[i+1]
if i == number_words:
groups[i][0] = words[i-1]
groups[i][1] = words[i]
groups[i][2] = ""
print groups
print parse_group_words("this is an example of text are you ready")
但我得到了:
0
Traceback (most recent call last):
File "/home/akf/program.py", line 82, in <module>
print parse_group_words("this is an example of text are you ready")
File "/home/akf/program.py", line 69, in parse_group_words
groups[i][0] = ""
IndexError: list index out of range
知道如何解决这个问题吗?
答案 0 :(得分:1)
这是使用Python集合和itertools为任意大小的窗口执行此操作的通用方法:
import re
import collections
import itertools
def window(seq, n=3):
d = collections.deque(maxlen=n)
for x in itertools.chain(('', ), seq, ('', )):
d.append(x)
if len(d) >= n:
yield tuple(d)
def windows(text, n=3):
return list(window((x.group() for x in re.finditer(r'\w+', text)), n=n))
答案 1 :(得分:0)
怎么样......:
import itertools, re
def parse_group_words(text):
groups = []
words = re.finditer(r'\w+', text)
prv, cur, nxt = itertools.tee(words, 3)
next(cur); next(nxt); next(nxt)
for previous, current, thenext in itertools.izip(prv, cur, nxt):
# in Py 3, use `zip` in lieu of itertools.izip
groups.append([previous.group(0), current.group(0), thenext.group(0)])
print(groups)
parse_group_words('tanto va la gatta al lardo che ci lascia')
几乎你需要的东西 - 它会发出:
[['tanto', 'va', 'la'], ['va', 'la', 'gatta'], ['la', 'gatta', 'al'], ['gatta', 'al', 'lardo'], ['al', 'lardo', 'che'], ['lardo', 'che', 'ci'], ['che', 'ci', 'lascia']]
...缺少最后一个必需的群组['ci', 'lascia', '']
。
要修复它,就在print
之前,您可以添加:
groups.append([groups[-1][1], groups[-1][2], ''])
这感觉就像是一个中间讨厌的黑客 - 我不能轻易找到一个优雅的方式来拥有这个最后一组&#34;只是出现&#34;从函数其余部分的一般逻辑来看。