我有一个Out[137]: array([4, 5], dtype=int64)
文件,其中包含这样的数据:
.txt
现在,我想将每一行作为列表存储在更大的列表中。
句子的预期输出如下:
math,mathematics
data,machine-learning-model
machine-learning,statistics,unsupervised-learning,books
orange,lda
machine-learning,deep-learning,keras,tensorflow
keras,similarity,distance,features
这是我尝试过的:
sentences = [['math', 'mathematics'],
['data', 'machine-learning-model'],
['machine-learning', 'statistics', 'unsupervised-learning', 'books'],
['orange', 'lda']]
现在,当我temp_tokens = []
sentences = []
fp = open('tags.txt')
lines = fp.readlines()
for line in lines:
temp_tokens.clear()
for word in line.split(','):
if word.strip('\n'):
temp_tokens.append(word)
temp_tokens = [e.replace('\n','') for e in temp_tokens]
print(temp_tokens)
sentences.append(temp_tokens)
print(sentences)
时,我得到以下输出:
print(temp_tokens)
这很好。但是,各个列表未正确添加到列表['math', 'mathematics']
['data', 'machine-learning-model']
['machine-learning', 'statistics', 'unsupervised-learning', 'books']
['orange', 'lda']
['machine-learning', 'deep-learning', 'keras', 'tensorflow']
['keras', 'similarity', 'distance', 'features']
['machine-learning']
后面。当我做sentences
时。句子列表如下:
它仅在每行中包含单独的标记,而行本身不作为列表。
sentences.append(temp_tokens)
有人可以告诉我我的代码有什么问题吗?为什么列表“ {[['data'], ['machine-learning'], ['orange'], ['machine-learning'], ['keras'], ['machine-learning\n'], ['machine-learning'], ['dataset'], ['lstm\n'], ['python'], ['python'], ['reinforcement-learning'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['overfitting'], ['machine-learning'], ['machine-learning'], ['time-series'], ['machine-learning'], ['linear-regression'], ['python'], ['keras'], ['python'], ['python'], ['pytorch\n'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['machine-learning'], ['gradient-descent\n'], ['python'], ['image'], ['dataset'], ['python'], ['neural-network'], ['machine-learning'], ['feature-selection'], ['nlp'], ['machine-learning'], ['python'], ['machine-learning'], ['cnn'], ['machine-learning'], ['neural-network'], ['machine-learning'], ['machine-learning'], ['deep-learning'], ['machine-learning'], ['python'], ['tensorflow'], ['machine-learning'], ['machine-learning'], ['machine-learning']
”没有作为完整列表附加到“ temp_tokens
”中,而是仅作为单个标记附加?
有人可以解释吗?
答案 0 :(得分:2)
替代,使用
with open('tags.txt') as fp:
sentences = [line.strip().split(',') for line in fp]
print(sentences)
答案 1 :(得分:1)
这应该有效:
sentences = []
fp = open('tags.txt')
for line in fp.readlines():
temp_tokens = line.strip().split(",")
sentences.append(temp_tokens)
print(sentences)
或更简洁:
sentences = []
with open('tags.txt') as fp:
for line in fp.readlines():
sentences.append(line.strip().split(","))
print(sentences)