Question

我有一个带有印地语文本行（约5400000行）的文本文件。我想将这些行保存在python的字符串数组中。我尝试了这段代码：

    f = open("cleanHindi_Translated.txt" , "r")
    array = []
    for line in f:
        array.append(line)

    print(array)

但是我遇到一个错误：

    Traceback (most recent call last):
  File "hindi.py", line 11, in <module>
    for line in f:
  File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined>
PS C:\Users\Preeti\Downloads\Compressed> python hindi.py
Traceback (most recent call last):
  File "hindi.py", line 11, in <module>
    for line in f:
  File "C:\Users\Preeti\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 124: character maps to <undefined>

我不明白我在这里做错了什么。

Answer 1

“行”是您要查找的数组（列表）

import io
with io.open('my_file.txt','r',encoding='utf-8') as f:
   lines = f.readlines()

将文本文件的内容存储在数组中

1 个答案: