Question

我正在处理一个数据集，其中包含需要公开发布的字符串的名称，但没有原始名称可见（即我需要能够区分不同的名称，但最终结果需要有类似的东西“e7fx8yuo”，原始数据集中有“John Doe”。

此方法的要求听起来类似于散列过程，但需求较少（即我不需要可变长度名称映射到单个长度散列），但名称需要映射到唯一字符串（两个不同）名称无法映射到相同的字符串）。

我打算在python中编写这个，但我并不完全确定我想要实现的进程是什么。如果可能的话，我也希望'hashed'最终产品字符串的行为类似于github生成存储库名称建议的方式（“reimagined-memory”而不是“e7fx8yuo”，因为一串完整的单词更难忘，更容易记住）。 python中有没有可以为我做这个的模块？

Answer 1

正如我在评论中所说，这听起来像数据屏蔽。这是一个基本的实现：

from collections import defaultdict
from string import ascii_lowercase
from random import choice

random_strings = set()

def random_string():
    while True:
        result = ''.join(choice(ascii_lowercase) for _ in range(8))
        if result not in random_strings:
            random_strings.add(result)
            return result

masks = defaultdict(random_string)

print(masks['Adam'])
print(masks['Adam'])
print(masks['Bob'])

输出：

qmmwavuk
qmmwavuk
ykzlvfaf

Answer 2

这是快速而肮脏的事情


import string
import random


def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
    return ''.join(random.choice(chars) for _ in range(size))     #with no arguments passed to this function it will return a 6 character string composed of letters and numbers


def makeID(names):

    nameDict = {}

    for i in names:
        var = id_generator()

        while var in nameDict:      #if the generator results already exist as a key we loop until we get a unique one
            var = id_generator()

        nameDict[var] = i     #Here we set our key as the generator results, and set the value to the current name in the list which in this case is 'i'


    print(nameDict,)



makeID(['John Doe','Jane NoDoe', 'Getsum MoDoe'])

输出：

{'H8WIAP': 'John Doe', '4NT7JC': 'Jane NoDoe', '208DBM': 'Getsum MoDoe'}

随机生成器来自Random string generation with upper case letters and digits in Python

什么是字符串到其他字符串的映射？

2 个答案: