使用pandas将唯一数字转换为md5哈希

时间:2015-02-23 12:46:08

标签: python python-2.7 pandas hashlib pandasql

早上好,全部。

我想将我的社会安全号码转换为md5哈希十六进制数字。结果应该是每个社会安全号码的唯一md5哈希十六进制数。

我的数据格式如下:

ob = onboard[['regions','lname','ssno']][:10]
ob

    regions lname   ssno
0    Northern Region (R1)    Banderas    123456789
1    Northern Region (R1)    Garfield    234567891
2    Northern Region (R1)    Pacino  345678912
3    Northern Region (R1)    Baldwin     456789123
4    Northern Region (R1)    Brody   567891234
5    Northern Region (R1)    Johnson     6789123456
6    Northern Region (R1)    Guinness    7890123456
7    Northern Region (R1)    Hopkins     891234567
8    Northern Region (R1)    Paul    891234567
9    Northern Region (R1)    Arkin   987654321

我使用hashlib尝试了以下代码:

import hashlib

ob['md5'] = hashlib.md5(['ssno'])

这给了我一个错误,它必须是一个字符串而不是列表。所以我尝试了以下内容:

ob['md5'] = hashlib.md5('ssno').hexdigest()



regions lname   ssno    md5
0    Northern Region (R1)    Banderas    123456789   a1b3ec3d8a026d392ad551701ad7881e
1    Northern Region (R1)    Garfield    234567891   a1b3ec3d8a026d392ad551701ad7881e
2    Northern Region (R1)    Pacino  345678912   a1b3ec3d8a026d392ad551701ad7881e
3    Northern Region (R1)    Baldwin     456789123   a1b3ec3d8a026d392ad551701ad7881e
4    Northern Region (R1)    Brody   567891234   a1b3ec3d8a026d392ad551701ad7881e
5    Northern Region (R1)    Johnson     678912345   a1b3ec3d8a026d392ad551701ad7881e
6    Northern Region (R1)    Johnson     789123456   a1b3ec3d8a026d392ad551701ad7881e
7    Northern Region (R1)    Guiness     891234567   a1b3ec3d8a026d392ad551701ad7881e
8    Northern Region (R1)    Hopkins     912345678   a1b3ec3d8a026d392ad551701ad7881e
9    Northern Region (R1)    Paul    159753456   a1b3ec3d8a026d392ad551701ad7881e

这非常接近我的需要,但无论社会安全号码是否不同,所有十六进制数字都是相同的。我正在尝试为每个社会安全号码获取一个带有唯一十六进制数字的十六进制数字。

有什么建议吗?

2 个答案:

答案 0 :(得分:11)

hashlib.md5只需要一个字符串作为输入 - 你可以通过一些NumPy / Pandas函数传递一个值数组。因此,您可以使用list comprehension来构建md5sums列表:

ob['md5'] = [hashlib.md5(val).hexdigest() for val in ob['ssno']]

答案 1 :(得分:1)

如果要哈希到SHA256,则需要先将字符串编码为(可能是)UTF-8:

val sum1 = option1 match {
  case Some(x) => option2 match {
    case Some(y) => Some(f1(x, y))
    case None => None
  }
  case None => None
}
val finalSum = sum1 match {
  case Some(x) => option3 match {
    case Some(y) => Some(f1(x, y))
    case None => None
  }
  case None => None
}