[使用Python 3.x]我正在尝试创建一个包含两列的CSV文件,一列包含虚假的电子邮件地址,第二列应包含相应功能中指定的某个国家/地区代码。
我希望国家代码 - 至少 - 均匀分配到每个电子邮件地址。但如果还有一种方式不均匀分布,那就太好了。例如,一个国家可以被分配到30%的电子邮件地址,另一个国家被分配到10%等等。
我最大的困难是创建一个字典,其中密钥是电子邮件地址和国家/地区代码的值,因此压缩两个长度不等且没有空值的列表(无)。另外,我认为创建字典是最好的方法,但我对编程和python很新,所以如果你有更好的解决方案请分享!!
这是我的代码:
from random import choice, randint
from string import ascii_lowercase
from itertools import zip_longest
import csv
def gen_name(length):
""""Generates a random name with the given amount of characters."""
return ''.join(choice(ascii_lowercase) for i in range(length))
def email_gen():
"""Generates a fake email address."""
user = gen_name(randint(5, 10))
host = gen_name(randint(5, 15))
return user + "@" + host + ".com"
def gen_plain_email_list(n):
"""Generates a list of n amount of random e-mail addresses"""
emaillist = []
for i in range(n):
emaillist.append(email_gen())
return emaillist
def gen_email_dict(n):
"""Generates a dictionary where the key is an e-mail address and the value a random country code."""
email_list = []
cc = ['us', 'gb', 'de', 'fr', 'it', 'nl', 'es', 'ae', 'br', 'au']
# Creating a list of n amount of e-mail addresses
for i in range(n):
email_list.append(email_gen())
# Creates dictionary with with an e-mail address from email_list and
# a random country code from the cc list
email_dict = dict(zip_longest(email_list, cc, fillvalue=choice(cc)))
return email_dict
def dict_to_csv(filename, n):
with open(filename, 'w', newline='') as f:
w = csv.writer(f)
w.writerows(gen_email_dict(n).items())
dict_to_csv('test.csv', 1000)
提前感谢您的帮助!
答案 0 :(得分:0)
您正在尝试滥用 zip
功能。在你的情况下使用genexp或dict-comprehension很简单:
def gen_email_dict(n):
return {get_email(): choice(cc) for _ in range(n)}
#return dict((get_email(), choice(cc)) for _ in range(n)) # python2
zip
函数只能用于长度相等的序列,而zip_longest
允许不等长,但默认值不是可以产生任意值的函数!
如果你真的想使用zip
,这样做的方法是拥有一个无限的国家代码生成器:
cc = ['us', 'gb', 'de', 'fr', 'it', 'nl', 'es', 'ae', 'br', 'au']
def _countries():
while True:
yield choice(cc)
countries = _countries()
def gen_email_dict(n):
# using zip_longest you'll get an infinite loop!!!
return dict(zip((gen_email() for _ in range(n)), countries))
# using itertools.islice you wont get an infinite loop.
# but there is no reason to complicate things.
#return dict(zip_longest((gen_email() for _ in range(n)), it.islice(countries, n)))
答案 1 :(得分:0)
如果您有每个国家/地区代码的百分比,只需展开国家/地区列表,直到您有足够的元素,然后随机播放列表:
cc = [('us', .2), ('gb', .2), ('de', .1), ('fr', .05), ('it', .05)]
distribution = n / sum(dist for c, dist in cc)
countries = []
for c, dist in cc:
countries.extend([c] * int(round(distribution * dist)))
# rounding errors may mean we have too few, add until we have enough
while len(countries) < n:
countries.append(random.choice(cc)[0])
random.shuffle(countries)
现在,您可以使用您的电子邮件地址压缩这些地址,并根据权重均匀分配国家/地区。
答案 2 :(得分:0)
要随机选择具有不同概率的国家/地区代码,您可以创建一组权重,这些权重可用于生成从中随机选择代码的Universe。我在这里使用Python 2,但总体思路是一样的:
emails = [chr(65 + i) + '@foo.bar' for i in range(26)]
cc_weights = dict(
us = 5,
gb = 3,
de = 2,
)
# A universe of country codes, based on the weighting we defined.
ccs = sum(([c] * n for c,n in cc_weights.iteritems()), [])
email_ccs = dict((e, choice(ccs)) for e in emails)