我在pandas df中有一列sites
。数据格式:字符串列表。我需要将列的值更改为随机生成的单词。我的数据如下:
row sites
1 ["Elle", "Harpers", "Cosmo"]
2 ["Elle", "Vogue"]
3 ["Cosmo"]
所需的输出:
row sites
1 ["KLD", "GHL", "JGF"]
2 ["KLD", "VGO"]
3 ["JGF"]
我也应该能够在名称之后颠倒名称,或者将其保存为VGO = Vogue
格式
我想使用numpy.random.randint
,但似乎此方法仅适用于整数。用replace
生成名称而不是硬编码的最快方法是什么?
答案 0 :(得分:2)
您可以在所有3个大写字母np.random.choice
的列表中使用combinations_with_replacement
from itertools import combinations_with_replacement
upperletters = map(chr, range(65, 91))
print(np.random.choice(list(map(''.join,
combinations_with_replacement(upperletters, 3))),
size=8, replace=False))
#['JPZ' 'SSU' 'AQW' 'GKQ' 'AIZ' 'UYY' 'IJS' 'AOR']
现在,要更改数据,您可以explode
和map
使用具有等效功能的字典进行操作。
s = df['sites'].explode()
codes = np.random.choice(list(map(''.join,
combinations_with_replacement(map(chr, range(65, 91)),
3))),
size=s.nunique(), replace=False)
d = {word:code for word, code in zip(s.unique(), codes)}
print(d) #so in d you keep the correspondence
{'Elle': 'IVV', 'Harpers': 'DDW', 'Cosmo': 'DKM', 'Vogue': 'MRV'}
df['sites'] = s.map(d).groupby(level=0).agg(list)
print(df)
row sites
0 1 [IVV, DDW, DKM]
1 2 [IVV, MRV]
2 3 [DKM]
答案 1 :(得分:1)
这是一种方法:
import pandas as pd
import random
# use sample data to create data frame
data = [(1, ["Elle", "Harpers", "Cosmo"]),
(2, ["Elle", "Vogue"]),
(3, ["Cosmo"])]
df = pd.DataFrame(data, columns=['row', 'sites'])
# get unique sites
unique_sites = df.explode('sites').loc[:, 'sites'].sort_values().unique()
# build map from actual site to masked site - could use full alphabet below
random.seed(123456)
site_to_scrambled = {
site: ''.join(random.choices('ABCDEFGHI', k=4))
for site in unique_sites }
def convert(sites, site_to_scrambled):
return [site_to_scrambled[site] for site in sites]
# apply the conversion
# (keep both sites and sites_scrambled to verify)
df['sites_scrambled'] = df['sites'].apply(
lambda x: convert(x, site_to_scrambled))
print(df)
row sites sites_scrambled
0 1 [Elle, Harpers, Cosmo] [AFAC, BCEB, HHAB]
1 2 [Elle, Vogue] [AFAC, AIDF]
2 3 [Cosmo] [HHAB]