我想将每组的缺失值归为
np.min
np.mean
对于缺少值的状态,我想用每indicatorKPI
个意思来估算。在这里,这意味着要归咎于塞尔维亚的缺失值
mydf = pd.DataFrame({'国家':[' no-A-state',' no-ISO-state','德国'塞尔维亚'奥地利'德国'塞尔维亚'奥地利', ],' indicatorKPI':[np.nan,np.nan,' SP.DYN.LE00.IN',NY.GDP.MKTP.CD', ' NY.GDP.MKTP.CD',' SP.DYN.LE00.IN',' NY.GDP.MKTP.CD',' SP .DYN.LE00.IN'],'值':[np.nan,np.nan,0.9,np.nan,0.7,0.2,0.3,0.6]})
所需的输出应类似于
mydf = pd.DataFrame({'Country':['no-A-state','no-ISO-state', 'no-A-state','no-ISO-state',
'germany','serbia','serbia', 'austria',
'germany','serbia', 'austria',],
'indicatorKPI':['SP.DYN.LE00.IN','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN',
'SP.DYN.LE00.IN','NY.GDP.MKTP.CD','SP.DYN.LE00.IN','NY.GDP.MKTP.CD','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN'],
'value':['MIN of all for this indicator', 'MEAN of all for this indicator','MIN of all for this indicator','MEAN of all for this indicator', 0.9,'MEAN of all for SP.DYN.LE00.IN indicator',0.7, 'MEAN of all for NY.GDP.MKTP.CD indicator',0.2, 0.3, 0.6]
})
答案 0 :(得分:2)
根据您的新示例,以下内容适用于我:
url[23:]+ "/skip_session/id=%s/" + url[:52]
基本上这样做是为了填补每个条件的缺失值,所以我们设置了“没有A状态”的最小值。国家,然后意味着没有ISO国家'国家。然后,我们将指标KPI'并计算每个组的均值并再次分配给空值行,各个国家'意味着使用执行查找的In [185]:
mydf.loc[mydf['Country'] == 'no-A-state', 'value'] = mydf['value'].min()
mydf.loc[mydf['Country'] == 'no-ISO-state', 'value'] = mydf['value'].mean()
mydf.loc[mydf['value'].isnull(), 'value'] = mydf['indicatorKPI'].map(mydf.groupby('indicatorKPI')['value'].mean())
mydf
Out[185]:
Country indicatorKPI value
0 no-A-state SP.DYN.LE00.IN 0.200000
1 no-ISO-state NY.GDP.MKTP.CD 0.442857
2 no-A-state SP.DYN.LE00.IN 0.200000
3 no-ISO-state SP.DYN.LE00.IN 0.442857
4 germany NY.GDP.MKTP.CD 0.900000
5 serbia SP.DYN.LE00.IN 0.328571
6 serbia NY.GDP.MKTP.CD 0.700000
7 austria NY.GDP.MKTP.CD 0.585714
8 germany SP.DYN.LE00.IN 0.200000
9 serbia NY.GDP.MKTP.CD 0.300000
10 austria SP.DYN.LE00.IN 0.600000
以下是分解的步骤:
map