MD5 Hash在Python中返回不同的结果

时间:2014-04-17 18:42:09

标签: python json hash md5

对于类赋值,我应该抓取文件的内容,计算MD5哈希并将其存储在单独的文件中。然后,我应该能够通过比较MD5哈希来检查完整性。我对Python和JSON比较陌生,所以我想我会尝试通过这项任务解决这些问题,而不是选择我已经知道的事情。

无论如何,我的程序从文件读取,创建一个哈希,并将该哈希存储到JSON文件中。问题出在我的完整性检查中。当我返回文件的计算散列的结果时,它与JSON文件中记录的结果不同,即使没有对文件进行任何更改。下面是一个正在发生的事情的例子,我也粘贴了我的代码。在此先感谢您的帮助。

例如:这些是我的JSON文件的内容

内容:b'我制作了一个文件来测试md5 \ n'

摘要:1e8f4e6598be2ea2516102de54e7e48e

这是当我尝试检查完全相同的文件的完整性时返回的内容(未对其进行任何更改): 内容:b'我制作了一个文件来测试md5 \ n'

摘要:ef8b7bf2986f59f8a51aae6b496e8954

import hashlib
import json
import os
import fnmatch
from codecs import open


#opens the file, reads/encodes it, and returns the contents (c)
def read_the_file(f_location):
    with open(f_location, 'r', encoding="utf-8") as f:
        c = f.read()

    f.close()
    return c


def scan_hash_json(directory_content):
    for f in directory_content:
        location = argument + "/" + f
        content = read_the_file(location)
        comp_hash = create_hash(content)
        json_obj = {"Directory": argument, "Contents": {"filename": str(f),
                                                        "original string": str(content), "md5": str(comp_hash)}}
        location = location.replace(argument, "")
        location = location.replace(".txt", "")
        write_to_json(location, json_obj)


#scans the file, creates the hash, and writes it to a json file
def read_the_json(f):
    f_location = "recorded" + "/" + f
    read_json = open(f_location, "r")
    json_obj = json.load(read_json)
    read_json.close()
    return json_obj


#check integrity of the file
def check_integrity(d_content):
    #d_content = directory content
    for f in d_content:
        json_obj = read_the_json(f)
        text = f.replace(".json", ".txt")
        result = find(text, os.getcwd())
        content = read_the_file(result)
        comp_hash = create_hash(content)
        print("content: " + str(content))
        print(result)
        print(json_obj)
        print()
        print("Json Obj: " + json_obj['Contents']['md5'])
        print("Hash: " + comp_hash)


#find the file being searched for
def find(pattern, path):
    result = ""
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result = os.path.join(root, name)
    return result


#create a hash for the file contents being passed in
def create_hash(content):
    h = hashlib.md5()
    key_before = "reallyBad".encode('utf-8')
    key_after = "hashKeyAlgorithm".encode('utf-8')
    content = content.encode('utf-8')
    h.update(key_before)
    h.update(content)
    h.update(key_after)
    return h.hexdigest()


#write the MD5 hash to the json file
def write_to_json(arg, json_obj):
    arg = arg.replace(".txt", ".json")
    storage_location = "recorded/" + str(arg)
    write_file = open(storage_location, "w")
    json.dump(json_obj, write_file, indent=4, sort_keys=True)
    write_file.close()

#variable to hold status of user (whether they are done or not)
working = 1
#while the user is not done, continue running the program
while working == 1:
    print("Please input a command. For help type 'help'. To exit type 'exit'")

    #grab input from user, divide it into words, and grab the command/option/argument
    request = input()
    request = request.split()

    if len(request) == 1:
        command = request[0]
    elif len(request) == 2:
        command = request[0]
        option = request[1]
    elif len(request) == 3:
        command = request[0]
        option = request[1]
        argument = request[2]
    else:
        print("I'm sorry that is not a valid request.\n")
        continue

    #if user inputs command 'icheck'...
    if command == 'icheck':
        if option == '-l':
            if argument == "":
                print("For option -l, please input a directory name.")
                continue

            try:
                dirContents = os.listdir(argument)
                scan_hash_json(dirContents)

            except OSError:
                print("Directory not found. Make sure the directory name is correct or try a different directory.")

        elif option == '-f':
            if argument == "":
                print("For option -f, please input a file name.")
                continue

            try:
                contents = read_the_file(argument)
                computedHash = create_hash(contents)
                jsonObj = {"Directory": "Default", "Contents": {
                    "filename": str(argument), "original string": str(contents), "md5": str(computedHash)}}

                write_to_json(argument, jsonObj)
            except OSError:
                print("File not found. Make sure the file name is correct or try a different file.")

        elif option == '-t':
            try:
                dirContents = os.listdir("recorded")
                check_integrity(dirContents)
            except OSError:
                print("File not found. Make sure the file name is correct or try a different file.")

        elif option == '-u':
            print("gonna update stuff")
        elif option == '-r':
            print("gonna remove stuff")

    #if user inputs command 'help'...
    elif command == 'help':
        #display help screen
        print("Integrity Checker has a few options you can use. Each option "
              "must begin with the command 'icheck'. The options are as follows:")
        print("\t-l <directory>: Reads the list of files in the directory and computes the md5 for each one")
        print("\t-f <file>: Reads a specific file and computes its md5")
        print("\t-t: Tests integrity of the files with recorded md5s")
        print("\t-u <file>: Update a file that you have modified after its integrity has been checked")
        print("\t-r <file>: Removes a file from the recorded md5s\n")

    #if user inputs command 'exit'
    elif command == 'exit':
        #set working to zero and exit program loop
        working = 0

    #if anything other than 'icheck', 'help', and 'exit' are input...
    else:
        #display error message and start over
        print("I'm sorry that is not a valid command.\n")

2 个答案:

答案 0 :(得分:0)

你在哪里定义h,这个方法中使用的是md5对象?

 #create a hash for the file contents being passed in
 def create_hash(content):
     key_before = "reallyBad".encode('utf-8')
     key_after = "hashKeyAlgorithm".encode('utf-8')
     print("Content: " + str(content))
     h.update(key_before)
     h.update(content)
     h.update(key_after)
     print("digest: " + str(h.hexdigest()))
     return h.hexdigest()

我怀疑你两次调用create_hash,但在两次调用中使用相同的md5对象。这意味着你第二次调用它时,你真正散列&#34; reallyBad *文件内容* hashkeyAlgorithmreallyBad *文件内容* hashKeyAlgorithm&#34;。您应该在create_hash中创建一个新的md5对象以避免这种情况。

修改:执行此更改后,以下是您的程序运行方式:

 Please input a command. For help type 'help'. To exit type 'exit'
 icheck -f ok.txt Content: this is a test

 digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
 help type 'help'. To exit type 'exit' icheck -t Content: this is a
 test

 digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
 help type 'help'. To exit type 'exit'

编辑#2: 你的scan_hash_json函数最后还有一个bug。您正从文件中删除.txt后缀,并调用write_to_json:

def scan_hash_json(directory_content):
        ...
        location = location.replace(".txt", "")
        write_to_json(location, json_obj)

但是,write_to_json期望文件以.txt结尾:

def write_to_json(arg, json_obj):
    arg = arg.replace(".txt", ".json")

如果你解决了这个问题,我认为它应该按预期完成所有事情......

答案 1 :(得分:0)

我看到你面临的两个可能的问题:

  1. 用于散列计算是从字符串的二进制表示计算
  2. 除非您只使用ASCII编码,否则相同的国际字符,例如č在UTF-8或Unicode编码中有不同的表示形式。
  3. 考虑:

    1. 如果您需要UTF-8或Unicode,normalize首先保存内容或计算哈希值
    2. 出于测试目的,比较内容二进制表示。
    3. 仅将UTF-8用于IO操作,codecs.open执行所有转换 为你

      从编解码器导入打开 打开('yourfile','r',encoding =“utf-8”)为f:
        decoding_content = f.read()