我有一个带有印地语文本行(约5400000行)的文本文件。我想将这些行保存在python的字符串数组中。我尝试了这段代码:
f = open("cleanHindi_Translated.txt" , "r")
array = []
for line in f:
array.append(line)
print(array)
但是我遇到一个错误:
Traceback (most recent call last):
File "hindi.py", line 11, in <module>
for line in f:
File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined>
PS C:\Users\Preeti\Downloads\Compressed> python hindi.py
Traceback (most recent call last):
File "hindi.py", line 11, in <module>
for line in f:
File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined>
我不明白我在这里做错了什么。
答案 0 :(得分:1)
“行”是您要查找的数组(列表)
import io
with io.open('my_file.txt','r',encoding='utf-8') as f:
lines = f.readlines()