我想填充没有随机值数据的列。
853 None
854 cheese empty
855 cheese other
856 yogurt empty
857 yogurt other
858 yogurt empty
859 yogurt other
860 butter empty
861 butter other
862 None
863 None
想得到类似的东西:
853 ASDFGHJAS
854 cheese empty
855 cheese other
856 yogurt empty
857 yogurt other
858 yogurt empty
859 yogurt other
860 butter empty
861 butter other
862 DFGHJRTYT
863 ERTYUIOIO
864 TYUIOPPWE
865 QWERTYUUI
866 CBNMTYUIO
我试过做类似的事情:
df1 = df[['english_name']].fillna(''.join(choice(ascii_uppercase) for i in range(12)), axis=1)
853 ASDFGHJAS
854 cheese empty
855 cheese other
856 yogurt empty
857 yogurt other
858 yogurt empty
859 yogurt other
860 butter empty
861 butter other
862 ASDFGHJAS
863 ASDFGHJAS
864 ASDFGHJAS
865 ASDFGHJAS
866 ASDFGHJAS
问题我每行都得到相同的值,并且每行需要唯一的随机值。
答案 0 :(得分:5)
对lambda
值使用apply
到nan
随机选择。
In [243]: df[['english_name']].apply(lambda x: x.fillna(''.join(choice(ascii_upper
...: case) for i in range(12))), axis=1)
Out[243]:
english_name
853 BIZLLWLFGUSD
854 cheese empty
855 cheese other
856 yogurt empty
857 yogurt other
858 yogurt empty
859 yogurt other
860 butter empty
861 butter other
862 NMHDRQMTWZXF
863 EGPCZFWEDOFR
或者,使用随机名称预先创建一系列相同长度,然后使用df.name.fillna(s)
In [259]: s = pd.Series([''.join(choice(ascii_uppercase) for i in range(12)) for _
...: in range(len(df))], index=df.index)
In [260]: df.english_name.fillna(s)
Out[260]:
853 BRFERJPGVDXP
854 cheese empty
855 cheese other
856 yogurt empty
857 yogurt other
858 yogurt empty
859 yogurt other
860 butter empty
861 butter other
862 NYYTRCSSCPWT
863 ZYBNJQIPIWEF
Name: english_name, dtype: object
答案 1 :(得分:1)
使用this answer,您可以定义一个函数来返回给定大小的随机字符串:
def random_string(N=9):
return ''.join(random.SystemRandom().choice(string.ascii_uppercase) for _ in range(N))
df[['english_name']].apply(lambda x: x.fillna(random_string()),axis=1)
答案 2 :(得分:1)
具有多个列的数据帧的通用解决方案
df = pd.DataFrame([
['a', np.nan, 'b'],
[np.nan, 'c', np.nan],
['d', np.nan, 'e'],
[np.nan, 'f', np.nan]
])
0 1 2
0 a NaN b
1 NaN c NaN
2 d NaN e
3 NaN f NaN
df
以获得系列dfs = df.stack(dropna=False)
wherenull = dfs.isnull().values
n = wherenull.sum()
生成填充值
np.random.seed([3,1415])
fills = pd.DataFrame(
np.random.choice(
list(ascii_uppercase),
(n, 12)
)).sum(1).values
填补缺失
dfs.loc[wherenull] = fills
dfs.unstack()
0 1 2
0 a QLCKPXNLNTIX b
1 AWYMWACAUZHT c NSMEDTNWHXNU
2 d FDXFZLYHMGEH e
3 WSOGGOVSIXKF f PYEPNHGRMMPO