Question

我是新手，我写了一个tokenize函数，它基本上包含一个由句子组成的txt文件，并根据空格和标点分割它们。这里的事情是它为我提供了一个输出，其中包含父列表中的子列表。

我的代码：

def tokenize(document)
    file = open("document.txt")
    text = file.read()
    hey = text.lower()
    words = re.split(r'\s\s+', hey)
    print [re.findall(r'\w+', b) for b in words]

我的输出：

[['what', 's', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'eggs', 'warden'], ['his', 'dad', 'was', 'warden', 'in', 'the', 'kitchen', 'poaching', 'eggs']]

期望的输出：

['what', 's', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'eggs', 'warden']['his', 'dad', 'was', 'warden', 'in', 'the', 'kitchen', 'poaching', 'eggs']

如何在输出中删除父列表？我需要在代码中进行哪些更改才能删除外部列表括号？

Answer 1

我想将它们作为个人名单

Python中的函数只能返回一个值。如果你想返回两件事（例如，在你的情况下，有两个单词列表）你必须返回一个可以容纳两个东西的对象，如列表，元组，字典。

不要混淆您想要打印输出的方式与对象返回的内容相混淆。

只需打印列表：

for b in words:
   print(re.findall(r'\w+', b))

如果你这样做，那么你的方法不返回任何东西（它实际上返回None）。

要返回两个列表：

return [re.findall(r'\w+', b) for b in words]

然后按照以下方式调用您的方法：

word_lists = tokenize(document)
for word_list in word_lists:
    print(word_list)

Answer 2

这应该有效

print ','.join([re.findall(r'\w+', b) for b in words])

Answer 3

我有一个例子，我猜这与你遇到的问题没什么不同......

我只占据列表的某个部分。

>>> a = [['sa', 'bbb', 'ccc'], ['dad', 'des', 'kkk']]
>>> 
>>> print a[0], a[1]
['sa', 'bbb', 'ccc'] ['dad', 'des', 'kkk']
>>>

拆分python列表

3 个答案: