我有一个文本文件:
样本.txt
Hi I am student
I am from
我试过的是
import string
import re
def read_to_list1(filepath):
text_as_string = open(filepath, 'r').read()
x = re.sub('['+string.punctuation+']', '', text_as_string).split("\n")
for i in x:
x_as_string = re.sub('['+string.punctuation+']', '', i).split()
print(x_as_string)
read_to_list1('sample.txt')
这个结果
['Hi,'I','am','student']
['I','am','from']
我希望结果为:
[['Hi,'I','am','student'],['I','am','from']]
答案 0 :(得分:1)
打开文件后,您可以使用列表推导式遍历行,并针对空白处的每一行 str.split
获取每个子列表的标记。
def read_to_list1(filepath):
with open(filepath, 'r') as f_in:
return [line.split() for line in f_in]
答案 1 :(得分:1)
对于具体示例 sample.txt,这也应该有效:
import string
import re
def read_to_list1(filepath):
text_as_string = open(filepath, 'r').read()
x = re.sub('['+string.punctuation+']', '', text_as_string).split("\n")
final_array=[]
for i in x:
x_as_string = re.sub('['+string.punctuation+']', '', i).split()
final_array.append(x_as_string)
return final_array
print(read_to_list1('sample.txt'))