MD5使用Python散列CSV

时间:2016-01-13 15:58:16

标签: python csv hash

我有一个带有电子邮件地址的csv需要以MD5格式进行哈希处理,然后将哈希的电子邮件保存为新的csv。我没有在SO上看到我的确切用例,也无法成功修改现有问题。

原始文件路径为"/Users/[username]/Downloads/email_original.csv",所需的输出文件为"/Users/[username]/Downloads/email_hashed.csv"

原始文件

email_addr
fake_email1@yahoo.com
fake_email2@gmail.com
fake_email3@college.edu
fake_email4@hotmail.com
fake_email5@ford.com

哈希文件

email_addr
0x3731BF23851200A7607BA554EEAF7912
0xA5D5D3B99896D32BAC64162BD56BE177
0xAE03858BDFBDF622AF5A1852317500C3
0xC870F8D75180AC9DA2188129C910489B
0xD7AFD8085548808459BDEF8665C8D52A

2 个答案:

答案 0 :(得分:3)

您的评论中的答案几乎是正确的。您只需open另一个具有write属性w的文件。我已将您的查询更改为使用with,因此您无需显式关闭文件处理程序:

with open("/Users/[username]/Downloads/email_original.csv",'rb')  as file:
    with open("/Users/[username]/Downloads/email_hashed.csv",'w')  as output:
        for line in file: 
           line=line.strip() 
           print hashlib.md5(line).hexdigest() 
           output.write(hashlib.md5(line).hexdigest() +'\n')

答案 1 :(得分:0)

Jaco 的答案很好但不完整,因为它忽略了MD5哈希的编码。如果将CSV格式修改为包含将来的其他列,则代码也将不足。下面是一个解决这两个问题的示例,同时还可以在将来轻松更改哈希值,同时指定可以应用单独哈希算法的其他列:

import csv
import hashlib

IN_PATH = 'email_original.csv'
OUT_PATH = 'email_hashed.csv'
ENCODING = 'ascii'
HASH_COLUMNS = dict(email_addr='md5')


def main():
    with open(IN_PATH, 'rt', encoding=ENCODING, newline='') as in_file, \
            open(OUT_PATH, 'wt', encoding=ENCODING, newline='') as out_file:
        reader = csv.DictReader(in_file)
        writer = csv.DictWriter(out_file, reader.fieldnames)
        writer.writeheader()
        for row in reader:
            for column, method in HASH_COLUMNS.items():
                data = row[column].encode(ENCODING)
                digest = hashlib.new(method, data).hexdigest()
                row[column] = '0x' + digest.upper()
            writer.writerow(row)

if __name__ == '__main__':
    main()