Question

我有一个Out[137]: array([4, 5], dtype=int64)文件，其中包含这样的数据：

.txt

现在，我想将每一行作为列表存储在更大的列表中。

句子的预期输出如下：

math,mathematics
data,machine-learning-model
machine-learning,statistics,unsupervised-learning,books
orange,lda
machine-learning,deep-learning,keras,tensorflow
keras,similarity,distance,features

这是我尝试过的：

sentences = [['math', 'mathematics'],
['data', 'machine-learning-model'],
['machine-learning', 'statistics', 'unsupervised-learning', 'books'],
['orange', 'lda']]

现在，当我temp_tokens = [] sentences = [] fp = open('tags.txt') lines = fp.readlines() for line in lines: temp_tokens.clear() for word in line.split(','): if word.strip('\n'): temp_tokens.append(word) temp_tokens = [e.replace('\n','') for e in temp_tokens] print(temp_tokens) sentences.append(temp_tokens) print(sentences)时，我得到以下输出：

print(temp_tokens)

这很好。但是，各个列表未正确添加到列表['math', 'mathematics'] ['data', 'machine-learning-model'] ['machine-learning', 'statistics', 'unsupervised-learning', 'books'] ['orange', 'lda'] ['machine-learning', 'deep-learning', 'keras', 'tensorflow'] ['keras', 'similarity', 'distance', 'features'] ['machine-learning']后面。当我做sentences时。句子列表如下：

它仅在每行中包含单独的标记，而行本身不作为列表。

sentences.append(temp_tokens)

有人可以告诉我我的代码有什么问题吗？为什么列表“ {[['data'], ['machine-learning'], ['orange'], ['machine-learning'], ['keras'], ['machine-learning\n'], ['machine-learning'], ['dataset'], ['lstm\n'], ['python'], ['python'], ['reinforcement-learning'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['overfitting'], ['machine-learning'], ['machine-learning'], ['time-series'], ['machine-learning'], ['linear-regression'], ['python'], ['keras'], ['python'], ['python'], ['pytorch\n'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['gradient-descent\n'], ['python'], ['image'], ['dataset'], ['python'], ['neural-network'], ['machine-learning'], ['feature-selection'], ['nlp'], ['machine-learning'], ['python'], ['machine-learning'], ['cnn'], ['machine-learning'], ['neural-network'], ['machine-learning'], ['machine-learning'], ['deep-learning'], ['machine-learning'], ['python'], ['tensorflow'], ['machine-learning'], ['machine-learning'], ['machine-learning']”没有作为完整列表附加到“ temp_tokens”中，而是仅作为单个标记附加？

有人可以解释吗？

Answer 1

替代，使用

with open('tags.txt') as fp:
    sentences = [line.strip().split(',') for line in fp]
print(sentences)

Answer 2

这应该有效：

sentences = []
fp  = open('tags.txt')
for line in fp.readlines():
    temp_tokens = line.strip().split(",")
    sentences.append(temp_tokens)
print(sentences)

或更简洁：

sentences = []
with open('tags.txt') as fp:
    for line in fp.readlines():
        sentences.append(line.strip().split(","))
print(sentences)

在python中将列表追加到另一个列表

2 个答案: