我有一个多索引数据框,其中包含一些经济和社会指标 此代码可以生成示例数据框
.avatarimg{
width: auto;
margin-left: 20px;
display: inline-block;
}
.progress-container {
position: relative;
display: inline-block;
}
.progress-container progress {
background-color: #eee;
height: 20px;
}
.progress-container progress::-webkit-progress-bar {
background-color: #eee;
}
.progress-container progress::-webkit-progress-value {
background-color: red;
}
.progress-container .progress-label {
position: absolute;
top: 2px;
margin: 0;
left: 60px;
font-size: 20px;
font-family: Minecraft;
}
.op{
width: auto;
margin-left: 40px;
display: inline-block;
-webkit-animation: bounce ease-in 5;
animation: bounce ease-in 5;
-webkit-animation-duration: 20000ms;
animation-duration: 1000ms;
}
.progress-containermana {
position: relative;
display: inline-block;
}
.progress-containermana progress {
background-color: #eee;
height: 20px;
}
.progress-containermana progress::-webkit-progress-bar {
background-color: #eee;
}
.progress-containermana progress::-webkit-progress-value {
background-color: blue;
}
.progress-containermana .progress-label {
position: absolute;
top: 2px;
margin: 0;
left: 60px;
font-size: 20px;
font-family: Minecraft;
}
以下是示例输出:
问题是数据框在数据框索引中包含许多类似拼写错误的键。例如,美国进入美国,美国,美国或美国等一次。 我想根据包含可能名称的列表合并这些组,其中列的值合并(如果重复,则为平均值)并排序。
import pandas as pd
import numpy as np
arrays = [['USA', 'USA', 'Egypt', 'Egypt', 'U.S.A.', 'U.S.A.', 'ARE, eg', 'ARE, eg', 'United States', 'France', 'France', 'France'],
[1950, 1980,1980, 2010, 2010, 1990, 1960, 1990, 2015, 1980, 1995, 2010]]
tuples = list(zip(*arrays))
index2 = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['Country', 'Year'])
cols= ['ind1', 'ind2', 'ind3', 'ind4']
df = pd.DataFrame(np.random.randn(12, 4), index=index2, columns=cols)
df.iloc[1::4,0] = np.nan; df.iloc[2::4,1] = np.nan; df.iloc[::3,2] = np.nan; df.iloc[1::3,3] = np.nan
df
如何在这个多索引pandas数据框中有效地执行此合并?
答案 0 :(得分:1)
IIUC,你可以这样做:
首先,让我们“反转”该字典,使其达到pd.dataframes中rename
方法的适当格式。
rename_dict = {}
for k,v in dfnew_names.items():
for item in v:
rename_dict[item]=k
其次,让我们使用重新格式化的字典重命名数据框中的索引并进行排序。
df.rename(index=rename_dict).sort_index()
输出:
ind1 ind2 ind3 ind4
Country Year
Egypt 1960 0.964161 NaN NaN -0.909796
1980 -0.568132 NaN -1.018460 2.295120
1990 0.185795 -0.517331 1.276134 NaN
2010 0.067946 0.895027 NaN 2.141615
France 1980 NaN 0.124058 NaN 1.377971
1995 -2.153890 NaN 1.334341 NaN
2010 0.019129 0.807188 0.804133 -0.698463
USA 1950 -0.023521 0.432706 NaN -0.701396
1980 NaN 0.824445 1.027330 NaN
1990 NaN 0.848902 -1.537311 -0.624271
2010 0.641681 -0.504838 -1.383700 NaN
2015 0.688233 -0.277385 2.036573 -0.821976
如果您使用多年,请更新groupby
df.rename(index=rename_dict).groupby(level=[0,1]).mean()
或者,您可以将mean
与level
参数一起使用(这是更好的方式):
df.rename(index=rename_dict).mean(level=[0,1]).sort_index()