Question

我正在尝试创建一个程序来读取文本文件并创建单词行列表。

但是我只能附加每一行而不是单词，任何帮助都会被这个问题所理解。

text = open("file.txt","r")

for line in text.readlines():
    sentence = line.strip()
    list.append(sentence)

    print list 
text.close()

示例文字

I am here
to do something

我希望它像这样追加它

[['I','am','here']['to','do','something']]

提前致谢。

Answer 1

示例中的每个line只是一个字符串，类似于

...
    PUNCTUATION = ',.?!"\''
    words = [w.strip(PUNCTUATION) for w in line.split() if w.strip(PUNCTUATION)]
    list.append(words)
...

可能对第一个近似值没有问题，尽管可能无法以你想要的方式覆盖每个边缘情况（即带连字符的单词，没有空格分隔的单词，带有尾随撇号的单词等）。

条件是避免空白条目。

Answer 2

你到底在哪里得到y变量？

在最基本的意义上（因为你还没有完全指定如何处理标点符号），你可以使用line.split(' ')将每一行拆分成一个单词列表，它会在每个空格上分割。如果你有其他分隔符，你可以替换它，而不是空格。如果需要，将上面的拆分分配给var并将其附加到列表中。

@Brendan提供了一个很好的解决方案来剥离基本标点符号。或者，您也可以使用简单的正则表达式re.findall(r'\w+', file)来查找给定文件中的所有单词。

使用另一种方式，您可以利用蟒蛇string库，尤其是string.punctuation：

str = list(line)
''.join([ word for word in str if not word in string.punctuation ]).split()

Answer 3

这样的事情会涵盖大量案例，并且可以根据您使用过的符号进行调整：

import re
text = open("file.txt","r")

for line in text.readlines():
    sentence = line.strip()
    words = re.sub(" +"," ",re.sub("[^A-Za-z']"," ",sentence)).split()
    somelist.append(words)

    print list 
text.close()

这只包括大写和小写字母和撇号（为了收缩）

Answer 4

>>> with open("file.txt","r") as f:
...     map(str.split, f)
... 
[['i', 'am', 'here'], ['to', 'do', 'something']]

Answer 5

text = open("file.txt","r")

word_groups = []

for line in text.readlines():
    words = line.strip().split(' ')
    word_groups.append(words)

print word_groups

text.close()

Answer 6

看起来你只是错过了对str.split()的电话。这是一个简单的单行list comprehension，可以满足您的要求：

>>> [line.split() for line in open('file.txt')]
[['i', 'am', 'here'], ['to', 'do', 'something']]

在读取文本文件python时从行追加单词

6 个答案: