Question

我正在尝试基于txt文件构建MD5哈希。但是，我需要遵循一些规则，例如：

编码规则必须为“ISO-8859-1”
所有字符必须为小写
不得在哈希构建上考虑新行字符和回车符

我的文件包含\r和\n个字符，表示新行和返回盒式磁带。我尝试使用rstrip和strip函数删除此字符，但它看起来不起作用。为了确保这一点，我写了一个txt文件并在Notepad ++上打开它，正如你在下面的图片中看到的那样，字符仍在那里。

Check the cr and lf characters in that image

我尝试了另一种解决方案：我使用split函数创建一个列表，使用\n作为分隔符，以确定这些字符是否真的在那里。我想，他们是。

如何真正删除这些字符？

我试过的其中一个代码：

from hashlib import md5

open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w')
file_content = open_file.read().lower().rstrip('\n\r ').strip('\n\r')

#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()

#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())

Answer 1

我会使用str.translate()删除“回车”和“换行”字符，如下所示：

file_content = file_content.translate({ord(ch):None for ch in '\r\n'})

或者，如果这是一个课堂作业，而我们还没有涵盖str.translate()，我可能会“手工”完成这项工作：

file_content = ''.join(ch for ch in file_content if ch not in '\r\n')

完成计划：

from hashlib import md5

open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w', encoding = 'ISO-8859-1')
file_content = open_file.read()

# Choose one of the following:
file_content = file_content.translate({ord(ch):None for ch in '\r\n'})
# file_content = ''.join(ch for ch in file_content if ch not in '\r\n')


#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()

#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())

Answer 2

原始文件是否采用ISO-8859-1编码？

如果是，则不应在散列之前对其进行编码，否则您应该进行编码，但不能使用此编码打开文件。

rstrip和lstrip不起作用，因为它只在整个内容的开头和结尾处转义：

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://www.banxico.org.mx/structure/key_families/dgie/sie/series/compact" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="DataSet"> <xs:complexType> <xs:sequence> <xs:element name="SiblingGroup"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="BANXICO_FREQ"/> <xs:attribute type="xs:duration" name="TIME_FORMAT"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="Series" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="Obs"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:date" name="TIME_PERIOD" use="optional"/> <xs:attribute type="xs:float" name="OBS_VALUE" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute type="xs:string" name="TITULO" use="optional"/> <xs:attribute type="xs:string" name="IDSERIE" use="optional"/> <xs:attribute type="xs:string" name="BANXICO_FREQ" use="optional"/> <xs:attribute type="xs:string" name="BANXICO_FIGURE_TYPE" use="optional"/> <xs:attribute type="xs:string" name="BANXICO_UNIT_TYPE" use="optional"/> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

希望它可以帮到你，

Python不会删除文件中的返回盒式磁带和馈送线

2 个答案: