Question

我有一个句子列表

s: 'hello everyone', 'how are you',..., 'i am fine'.

我想将此句子列表拆分为单词列表。

所以我的预期结果是：

[['hello', 'everyone'], ['how', 'are', 'you'], .., ['i', 'am', 'fine]]

我这样尝试：

def split_list(sentence):
    for s in sentence:
        s=s.split()
    return s

但是我只有一个单词列表，而不是单词列表。

['hello', 'everyone', 'how', 'are', 'you', .., 'i', 'am', 'fine]

Answer 1

对于函数sentence中split_list所指的含义还不是很清楚，但是如果它是['hello everyone', 'how are you', 'i am fine']之类的字符串列表，则最终会覆盖相同的字符串{{1 }}，每次迭代，最后得到最后一次迭代的结果，即s

因此，您需要确保将所有结果收集在列表列表中并返回。

假设它是上面的字符串列表，您可以像这样进行列表理解

['i', 'am', 'fine']

或者是普通的for循环

s = ['hello everyone', 'how are you', 'i am fine']

def split_list(sentence):
    # Split each sentence in the list, and append to result list
    return [item.split() for item in sentence]

print(split_list(s))

两种情况下的输出将相同。

s = ['hello everyone', 'how are you', 'i am fine']

def split_list(sentence):
    result = []
    #Split each sentence in the list, and append to result list
    for s in sentence:
        result.append(s.split())
    return result

print(split_list(s))

Answer 2

您必须通过在循环之前初始化一个空列表并将每个结果附加到循环中来将每次迭代的结果保存在列表中：

def split_list(sentence):
    L = []
    for s in sentence:
        L.append(s.split())
    return L

否则，该函数将仅返回上一次迭代的结果。

Answer 3

from nltk import word_tokenize
s = ['hello everyone', 'how are you', 'i am fine']

token = [word_tokenize(x) for x in s]
print(token)

o/p
[['hello', 'everyone'], ['how', 'are', 'you'], ['i', 'am', 'fine']]

Answer 4

这可以通过列表理解来完成。

s = ['hello everyone', 'how are you', 'i am fine']
s2 = [c.split() for c in s]
print(s2) # [['hello', 'everyone'], ['how', 'are', 'you'], ['i', 'am', 'fine']]

使用python将句子列表拆分为单词列表

4 个答案: