我希望能够为数据框分配PRNG。
我可以使用cat.codes
或ngroup()
import pandas as pd
import random
import string
df1 = pd.DataFrame({'Name': ['John', 'Susie', 'Jack', 'Jill', 'John']})
df1['id'] = df1.groupby('Name').ngroup()
df1['idz'] = df1['Name'].astype('category').cat.codes
Name id idz
0 John 2 2
1 Susie 3 3
2 Jack 0 0
3 Jill 1 1
4 John 2 2
我已使用this post中的函数逐行创建此唯一ID。
def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
return ''.join(random.SystemRandom().choice(chars) for _ in range(size))
df1['random id'] = df1['idz'].apply(lambda x : id_generator(3))
Name id idz random id
0 John 2 2 118 #<--- Check Here
1 Susie 3 3 KGZ
2 Jack 0 0 KMQ
3 Jill 1 1 T2L
4 John 2 2 Q3F #<--- Check Here
但是我如何将两者结合在一起,以便约翰在这个小用例中会收到相同的ID?我希望喜欢,以避免因数据大小而导致长if ID not used, then ID, and if name has ID, use existing ID
次循环。
答案 0 :(得分:2)
gourpby
+ transform
df1['random id'] = df1.groupby('idz').idz.transform(lambda x : id_generator(3))
df1
Out[657]:
Name id idz random id
0 John 2 2 35P
1 Susie 3 3 6UU
2 Jack 0 0 XGF
3 Jill 1 1 5LC
4 John 2 2 35P
答案 1 :(得分:0)
将其与&#34;作为前提可能不是最有效的选项&#34;。
我会首先找到每个唯一身份用户,为每个唯一身份用户生成随机ID。
# Finding unique users and storing in a new DataFrame
df_unique_users = pd.DataFrame({'Name':[x for x in set(df['Name'])]})
# Generating unique user ID's for length of data frame
# By using a set you are guaranteed unique values. You just need to make sure
# you have enough permutations of the unique random_id so that your rand_set
# will eventually be longer than your unique Names DataFrame.
rand_set = set()
while(len(rand_set)<len(df_unique_users)):
rand_set = rand_set.union([id_generator(3)])
df_unique_users['Rand_ID'] = rand_set
### Mapping the random ID's over to the original DataFrame
df = df.merge(df_unique_users, how='left', left_on='Name', right_on='Name')
您可以使用原始ID列而不是“名称”列来获取唯一值。