输入数据框:
max_value = 16
x_max = max_value
data = {
's_id' :['G1','','','','G2','G3','G3','G4','','','']
}
df2 = pd.DataFrame.from_dict(data)
df2
Out[365]:
s_id
0 G1
1
2
3
4 G2
5 G3
6 G3
7 G4
8
9
10
输出数据框:
data = {
's_id' :['G1','G17','G18','G19','G2','G3','G3','G4','G20','G21','G22']
}
df3 = pd.DataFrame.from_dict(data)
df3
Out[366]:
s_id
0 G1
1 G17
2 G18
3 G19
4 G2
5 G3
6 G3
7 G4
8 G20
9 G21
10 G22
我尝试了以下操作: df2 ['s_id'] = df2 ['s_id']。mask(df2 ['s_id']。eq(''))
s = df2[df2['s_id'].isna()].drop_duplicates()
TypeError:不可散列的类型:“列表”
d = {v: f'G{k}' for k, v in enumerate(s, x_max + 1)}
print (d)
如何实现输出数据帧,如果值为空S_ID,则将其替换为外部变量的最大值。检查s_id列的值替换为外部变量的增量值。例如:在G1之后的s_id列中,必须为G17,即max_value +1,
答案 0 :(得分:3)
创建的想法list
的大小与range
的空值数量相同,并使用DataFrame.loc
的掩码将值设置为列:
df2 = pd.DataFrame.from_dict(data)
m = df2['s_id'].eq('')
v = [f'G{x}' for x in range(x_max+1, x_max + m.sum()+1)]
print (v)
['G17', 'G18', 'G19', 'G20', 'G21', 'G22']
df2.loc[m, 's_id'] = v
print (df2)
s_id
0 G1
1 G17
2 G18
3 G19
4 G2
5 G3
6 G3
7 G4
8 G20
9 G21
10 G22
@Jon Clements的解决方案,谢谢:
df2['s_id'] = df2['s_id'].apply(lambda v, c=itertools.count(x_max + 1): v or f'G{next(c)}')