Question

我在文件读取和哈希处理中遇到了意外的行为（在Python 3.7中）。

我有一个文件，其中仅包含文本“ helloworld”，结尾没有换行符：

>>hexdump -C input.txt
00000000  68 65 6c 6c 6f 77 6f 72  6c 64 0a                 |helloworld.|
0000000b

我运行以下Python脚本：

def hashit(inp):
    return hashlib.md5(inp.encode('utf-8')).hexdigest()

from_var = 'helloworld'

with open('input.txt', 'r') as fo:
    from_file = fo.read()

print(f' from_file      : { repr(from_file) }')
print(f' from_var       : { repr(from_var) }')

print(f' from_file hash : { hashit(from_file) }')
print(f' from_var  hash : { hashit(from_var) }')

我得到以下输出：

from_file      : 'helloworld\n'
from_var       : 'helloworld'
from_file hash : d73b04b0e696b0945283defa3eee4538
from_var  hash : fc5e038d38a57032085441e7fe7010b0

当我读取文件时，我注意到的第一件事是换行符。这是哪里来的？

鉴于尾随换行符，两个字符串的哈希值不同也就不足为奇了。

要进行检查，然后直接在文件上运行md5sum实用程序：

>>md5sum input.txt 
d73b04b0e696b0945283defa3eee4538  input.txt

我一点都不明白。 Shell中的md5sum与带有尾随换行符的字符串的md5sum相同-即使文件中没有换行符。

所以我的问题是：

为什么.read（）在文件末尾添加换行符？
即使文件没有换行符，为什么命令行中的md5sum仍与结尾的换行符**相对应？

Python Md5散列，读取文件时尾随换行符

0 个答案: