Question

在函数的最后，我将结果写入一个文本文件，该文件由于不存在而创建，如下所示：

new_file = charity + ".txt"
with open(new_file, "w") as handle:
    handle.write("Matches found for " + charity.upper() + " in order of compatibility:\n")
    for item in match_lst:
            handle.write("Grant: " + item[2] + ". Funding offered: " + int_to_str(item[1]))
            handle.write("Number of matches: " + str(item[0] - 1) + "\n")
    handle.close()

我的问题是，当它写入新文件时，似乎不认可换行符，'£'字符和撇号字符。为了显示我在说什么，这里是输出文件的摘录：

Matches found for BLA in order of compatibility:
Grant: The Taylor Family Foundation. Funding offered: �500,000.00Number of matches: 1
Grant: The Peter Cruddas Foundation. Funding offered: �200,000.00Number of matches: 1
Grant: The London Marathon Charitable Trust Limited - Major Capital Project 
Grants. Funding offered: �150,000.00Number of matches: 1
Grant: The Hadley Trust. Funding offered: �100,000.00Number of matches: 1
Grant: The Company Of Actuaries� Charitable Trust Fund. Funding offered: �65,000.00Number of matches: 1
Grant: The William Wates Memorial Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Nomura Charitable Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Grocers� Charity. Funding offered: �40,000.00Number of matches: 1

作为参考，这是我尝试以其原始数据结构编写的信息（即match_lst）

[(2, 500000.0, 'The Taylor Family Foundation', ['Young People', 'Arts Or Heritage', 'Social Reserarch'], ['Registered Charity']), 
(2, 200000.0, 'The Peter Cruddas Foundation', ['Young People'], ['Registered Charity', 'Other']),
(2, 150000.0, 'The London Marathon Charitable Trust Limited - Major Capital Project Grants', ['Infrastructure Support', 'Sport And Recreational Activities'], ['Registered Charity', 'Limited Company', 'Other']), 
(2, 100000.0, 'The Hadley Trust', ['Social Relief And Care', 'Crime And Victimisation', 'Young People', 'Social Reserarch'], ['Registered Charity', 'Limited Company']), 
(2, 65000.0, 'The Company Of Actuaries’ Charitable Trust Fund', ['Young People', 'Disabilities', 'Social Relief And Care', 'Medical Research'], ['Registered Charity']), 
(2, 50000.0, 'The William Wates Memorial Trust', ['Young People', 'Arts Or Heritage', 'Sport And Recreational Activities'], ['Registered Charity', 'Other']), 
(2, 50000.0, 'The Nomura Charitable Trust', ['Young People', 'Education And Learning', 'Unemployment'], ['Registered Charity']), 
(2, 40000.0, 'The Grocers’ Charity', ['Poverty', 'Young People', 'Disabilities', 'Healthcare Sector', 'Arts Or Heritage'], ['Registered Charity']) ]

如您所见，所有字符都在此处打印良好。

有关更多信息，这是我简单的int_to_str函数：

def int_to_str(num_int):
if num_int == 0:
    return "Discretionary"

else:
    return '£' + '{:,.2f}'.format(num_int)

所以我的问题是如何解决这个问题，以打印所有丢失/编码的字符？

Answer 1

很难猜测没有细节。无论如何，这确实是一个字符集问题。让我们看一些无法正确显示的字符：

换行符-取决于操作系统：在类Unix系统上仅\n，在Windows上仅\r\n（2个字符）。
'£'或POUND SIGN。它是Unicode字符U + 00A3。在Windows代码页1252或Latin1（ISO-8859-1）中，它是一个字节b'\xa3'，而在utf8中，它被编码为b'\xc2\xa3'。更有趣的是，如果尝试在UTF-8中显示b'\xa3'，则会得到REPLACEMENT CHARACTER U + FFFD，其内容为'�'。
撇号字符。 true 撇号（"'"）是ASCII字符U + 0027。没问题但是某些启用了unicode的编辑器可以用RIGHT QUOTATION MARK（U + 2019或"’"）静默替换它。仅在Windows 1252代码页或Latin1中不存在它。

这仅意味着细节很重要。如果不确切知道如何从二进制文件中读取数据或如何构建数据，就无法解释实际发生的情况。文本文件是一种抽象。实际文本文件是具有给定编码和行尾约定的字节序列。

Answer 2

似乎每行都被写到新行，因为字符串似乎不是一个连续的文本，输出中的\n字符只是被隐藏了。要解决编码问题，必须在文件打开命令中指定编码：

with open(new_file, 'w', encoding="utf-8") as handle:
    ...

我会将其发布为该问题的未来答案。

谢谢

写入文本文件python时不接受某些字符

2 个答案: