写入文本文件python时不接受某些字符

时间:2019-01-02 15:16:04

标签: python io file-writing

在函数的最后,我将结果写入一个文本文件,该文件由于不存在而创建,如下所示:

new_file = charity + ".txt"
with open(new_file, "w") as handle:
    handle.write("Matches found for " + charity.upper() + " in order of compatibility:\n")
    for item in match_lst:
            handle.write("Grant: " + item[2] + ". Funding offered: " + int_to_str(item[1]))
            handle.write("Number of matches: " + str(item[0] - 1) + "\n")
    handle.close()

我的问题是,当它写入新文件时,似乎不认可换行符,'£'字符和撇号字符。为了显示我在说什么,这里是输出文件的摘录:

Matches found for BLA in order of compatibility:
Grant: The Taylor Family Foundation. Funding offered: �500,000.00Number of matches: 1
Grant: The Peter Cruddas Foundation. Funding offered: �200,000.00Number of matches: 1
Grant: The London Marathon Charitable Trust Limited - Major Capital Project 
Grants. Funding offered: �150,000.00Number of matches: 1
Grant: The Hadley Trust. Funding offered: �100,000.00Number of matches: 1
Grant: The Company Of Actuaries� Charitable Trust Fund. Funding offered: �65,000.00Number of matches: 1
Grant: The William Wates Memorial Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Nomura Charitable Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Grocers� Charity. Funding offered: �40,000.00Number of matches: 1

作为参考,这是我尝试以其原始数据结构编写的信息(即match_lst)

[(2, 500000.0, 'The Taylor Family Foundation', ['Young People', 'Arts Or Heritage', 'Social Reserarch'], ['Registered Charity']), 
(2, 200000.0, 'The Peter Cruddas Foundation', ['Young People'], ['Registered Charity', 'Other']),
(2, 150000.0, 'The London Marathon Charitable Trust Limited - Major Capital Project Grants', ['Infrastructure Support', 'Sport And Recreational Activities'], ['Registered Charity', 'Limited Company', 'Other']), 
(2, 100000.0, 'The Hadley Trust', ['Social Relief And Care', 'Crime And Victimisation', 'Young People', 'Social Reserarch'], ['Registered Charity', 'Limited Company']), 
(2, 65000.0, 'The Company Of Actuaries’ Charitable Trust Fund', ['Young People', 'Disabilities', 'Social Relief And Care', 'Medical Research'], ['Registered Charity']), 
(2, 50000.0, 'The William Wates Memorial Trust', ['Young People', 'Arts Or Heritage', 'Sport And Recreational Activities'], ['Registered Charity', 'Other']), 
(2, 50000.0, 'The Nomura Charitable Trust', ['Young People', 'Education And Learning', 'Unemployment'], ['Registered Charity']), 
(2, 40000.0, 'The Grocers’ Charity', ['Poverty', 'Young People', 'Disabilities', 'Healthcare Sector', 'Arts Or Heritage'], ['Registered Charity']) ]

如您所见,所有字符都在此处打印良好。

有关更多信息,这是我简单的int_to_str函数:

def int_to_str(num_int):
if num_int == 0:
    return "Discretionary"

else:
    return '£' + '{:,.2f}'.format(num_int)

所以我的问题是如何解决这个问题,以打印所有丢失/编码的字符?

2 个答案:

答案 0 :(得分:2)

很难猜测没有细节。无论如何,这确实是一个字符集问题。让我们看一些无法正确显示的字符:

  • 换行符-取决于操作系统:在类Unix系统上仅\n,在Windows上仅\r\n(2个字符)。
  • '£'或POUND SIGN。它是Unicode字符U + 00A3。在Windows代码页1252或Latin1(ISO-8859-1)中,它是一个字节b'\xa3',而在utf8中,它被编码为b'\xc2\xa3'。更有趣的是,如果尝试在UTF-8中显示b'\xa3',则会得到REPLACEMENT CHARACTER U + FFFD,其内容为'�'
  • 撇号字符。 true 撇号("'")是ASCII字符U + 0027。没问题但是某些启用了unicode的编辑器可以用RIGHT QUOTATION MARK(U + 2019或"’")静默替换它。仅在Windows 1252代码页或Latin1中不存在它。

这仅意味着细节很重要。如果不确切知道如何从二进制文件中读取数据或如何构建数据,就无法解释实际发生的情况。文本文件是一种抽象。实际文本文件是具有给定编码和行尾约定的字节序列。

答案 1 :(得分:1)

似乎每行都被写到新行,因为字符串似乎不是一个连续的文本,输出中的\n字符只是被隐藏了。要解决编码问题,必须在文件打开命令中指定编码:

with open(new_file, 'w', encoding="utf-8") as handle:
    ...

我会将其发布为该问题的未来答案。

谢谢