Python MD5散列相同的内容返回不同的散列

时间:2017-10-11 22:17:42

标签: python html file hash md5

我正在写一个python程序,因为我很懒,检查一个网站是否有我被告知的职位空缺并返回公司网页上的所有工作。

到目前为止,这是我的代码(是的,我知道代码是jancky,但我只是想让它工作)

import requests
from bs4 import BeautifulSoup
import sys
import os
import hashlib

reload(sys)
sys.setdefaultencoding('utf8')

res = requests.get('WEBSITE URL', verify=False)
res.raise_for_status()

filename = "JobWebsite.txt"

def StartUp():
    if not os.path.isfile(filename):
        try:
            jobfile = open(filename, 'a')
            jobfile = open(filename, 'r+')
            print("[*] Succesfully Created output file")
            return jobfile
        except:
            print("[*] Error creating output file!")
            sys.exit(0)
    else:
         try:
             jobfile = open(filename, 'r+')
             print("[*] Succesfully Opened output file")
             return jobfile
         except:
             print("[*] Error opening output file!")
             sys.exit(0)

 def AnyChange(htmlFile):
    fileCont = htmlFile.read()
    FileHash = hasher(fileCont, "File Code Hashed")
    WebHash = hasher(res.text, "Webpage Code Hashed")
    !!!!! Here is the Problem
    print ("[*] File hash is " + str(FileHash))
    print ("[*] Website hash is " + str(WebHash))
    if FileHash == WebHash:
        print ("[*] Jobs being read from file!")
        num_of_jobs(fileCont)
    else:
        print("[*] Jobs being read from website!")
        num_of_jobs(res.text)
        deleteContent(htmlFile)
        writeWebContent(htmlFile, res.text)

def hasher(content, message):
    content = hashlib.md5(content.encode('utf-8'))
    return content

def num_of_jobs(htmlFile):
    content = BeautifulSoup(htmlFile, "html.parser")
    elems = content.select('.search-result-inner')
    print("[*] There are " + str(len(elems)) + " jobs available!")

def deleteContent(htmlFile):
    print("[*] Deleting Contents of local file! ")
    htmlFile.seek(0)
    htmlFile.truncate()

def writeWebContent(htmlFile, content):
    htmlFile = open(filename, 'r+')
    print("[*] Writing Contents of website to file! ")
    htmlFile.write(content.encode('utf-8'))

jobfile = StartUp()
AnyChange(jobfile)

我目前遇到的问题是我将两个网站的html代码和文件html代码都哈希。然而,两个哈希都不像以前那样匹配,我不确定,只能猜测它可能是内容被保存在文件中的东西。哈希值相距太远但是每次都会导致If语句失败

Breakpoint in Program with hashes

1 个答案:

答案 0 :(得分:1)

您附加的屏幕截图显示了两个哈希对象fileHashwebHash的位置。他们应该在不同的地方。

您真正想要比较的是两个哈希对象的hexdigest()。将您的if声明更改为:

if FileHash.hexdigest() == WebHash.hexdigest():
        print ("[*] Jobs being read from file!")
        num_of_jobs(fileCont)

请查看此other StackOverflow answer了解更多操作方法。