Question

从csv文件中读取文本时遇到问题。 csv文件中的示例行如下所示：“

1477-7819-4-45-2 Angiolymphatic Invasion（H＆amp; E400Ã）。“

我猜问题是文本的编码，因此我决定将其更改为ASCII。

到目前为止，这是我的python代码：

text_path = '/some_path/filename.csv'
text_path_ascii = '/some_path/filename_ASCII.csv'

input_codec = 'UTF-16'
output_codec = 'ASCII'

for line in unicode_file:
    unicode_data = unicode_file.read().decode(input_codec)
    #here is another problem => AttributeError: 'str' object has no attribute 'decode'
    unicode_data = unicode_file.read()

ascii_file = open(text_path_ascii, 'w')
ascii_file.write(unicode_data.write(unicode_data.encode(output_codec)))
# same problem=> AttributeError: 'str' object has no attribute 'encode'
ascii_file.write(unicode_data.encode(output_codec))

所以我的问题是我不知道如何编码/解码文本。

我甚至不确定这是否是处理错误书面文本的正确方法（是的，如果您使用任何编辑器打开文本，文本看起来像给定的行）。

或者是否可以直接在没有“破损”字符的情况下阅读csv文本？

感谢您的想法

Answer 1

decode上没有str方法，但它位于bytes

如果要解码它。您可以使用open本身。

file = open(filename, mode, encoding='utf-8')

将csv文本从utf-16转换为ascii或正确读入

1 个答案: