Question

    import petl as etl

    file_name = 'name of file'
    file_in_memory = etl.fromcsv(file_name, encoding='utf-8')
    print (etl.look(file_in_memory))

    Traceback (most recent call last):
      File "<interactive input>", line 1, in <module>
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13: ordinal not in range(128)

该文件包含“20 Rue d'Estrées，75007 Paris，France”，导致错误。

我可以使用codes.open（file_name，mode ='r'，encoding ='utf-8'）读取文件，但希望能够使用petl库轻松操作csv。

有没有办法可以通过petl.fromcsv将其加载到内存中，同时保留字符？

Answer 1

需要首先使用chardet模块找出文件的编码。通过使用通用检测器功能，它遍历文件的内容并根据文件中的字符返回编码。

使用键＆＃39;编码＆＃39;返回字典。

   from chardet.universaldetector import UniversalDetector
   import petl as etl

   detector = UniversalDetector()
   file_open = open(file_name)
   for line in file_open.readlines():
       detector.feed(line)
       if detector.done: break
   detector.close()
   file_open.close()
   file_encoding = detector.result['encoding']

   file_name = 'name of file'
   file_in_memory = etl.fromcsv(file_name, encoding=file_encoding)
   print (etl.look(file_in_memory))

如果需要多次，可以将文件编码的检测放入函数中。

在Python中使用petl模块加载utf-8文件时出错

1 个答案: