如果缺少列值,则将增量值替换为熊猫数据框中的值

时间:2019-12-04 13:42:17

标签: pandas

输入数据框:

max_value = 16
x_max = max_value
data = {

's_id' :['G1','','','','G2','G3','G3','G4','','','']    

}
df2 = pd.DataFrame.from_dict(data)
df2
Out[365]: 
   s_id
0    G1
1      
2      
3      
4    G2
5    G3
6    G3
7    G4
8      
9      
10     

输出数据框:

    data = {

's_id' :['G1','G17','G18','G19','G2','G3','G3','G4','G20','G21','G22']    

}
df3 = pd.DataFrame.from_dict(data)
df3

Out[366]: 
   s_id
0    G1
1   G17
2   G18
3   G19
4    G2
5    G3
6    G3
7    G4
8   G20
9   G21
10  G22

我尝试了以下操作:     df2 ['s_id'] = df2 ['s_id']。mask(df2 ['s_id']。eq(''))

s = df2[df2['s_id'].isna()].drop_duplicates()

TypeError:不可散列的类型:“列表”

d = {v: f'G{k}' for k, v in enumerate(s, x_max + 1)}
print (d)

如何实现输出数据帧,如果值为空S_ID,则将其替换为外部变量的最大值。检查s_id列的值替换为外部变量的增量值。例如:在G1之后的s_id列中,必须为G17,即max_value +1,

1 个答案:

答案 0 :(得分:3)

创建的想法list的大小与range的空值数量相同,并使用DataFrame.loc的掩码将值设置为列:

df2 = pd.DataFrame.from_dict(data)

m = df2['s_id'].eq('')
v = [f'G{x}' for x in range(x_max+1, x_max + m.sum()+1)]
print (v)
['G17', 'G18', 'G19', 'G20', 'G21', 'G22']

df2.loc[m, 's_id'] = v
print (df2)
   s_id
0    G1
1   G17
2   G18
3   G19
4    G2
5    G3
6    G3
7    G4
8   G20
9   G21
10  G22

@Jon Clements的解决方案,谢谢:

df2['s_id'] = df2['s_id'].apply(lambda v, c=itertools.count(x_max + 1): v or f'G{next(c)}')