我有一个充满数据的pandas数据框
import pandas as pd
import numpy as np
varNames = ["point1","point2","point3","point4","point5"]
df = pd.DataFrame(np.random.randn(5,2),index=varNames,columns=["data1","data2"])
我想用这个创建一个带有multiIndex的系列。我能做的指数:
iterables=[["point1","point2","point3"],["point4","point5"]]
index=pd.MultiIndex.from_product(iterables, names=['numerator', 'denominator'])
我不知道如何填写这个系列。我喜欢
之类的东西s = pd.Series(max(df.loc[index["numerator"]]/df.loc[index["denominator"]]),index=index)
我想将第一个数据帧中的每一行作为分子列出,并将其除以列出分母的第一个数据帧中的每一行,从结果行中找到最大值值并将其存储在系列中的相关位置(s [variableN,variableM])。
这是我第一次使用这个多索引的东西,没有逐行完成系列,计算出价值并存储它,类似(我想,我不认为我'我已经能够完全理解这一点了{} {},我无法弄清楚如何做到这一点。
答案 0 :(得分:0)
您可以将reindex
与参数level
一起使用max
:
df3 = df.reindex(index, level=0).div(df.reindex(index, level=1)).max(level=0)
样品:
np.random.seed(456)
varNames = ["point1","point2","point3","point4","point5"]
df = pd.DataFrame(np.random.randn(5,2),index=varNames,columns=["data1","data2"])
print (df)
data1 data2
point1 -0.668129 -0.498210
point2 0.618576 0.568692
point3 1.350509 1.629589
point4 0.301966 0.449483
point5 -0.345811 -0.315231
iterables=[["point1","point2","point3"],["point4","point5"]]
index=pd.MultiIndex.from_product(iterables, names=['numerator', 'denominator'])
df1 = df.reindex(index, level=0)
print (df1)
data1 data2
numerator denominator
point1 point4 -0.668129 -0.498210
point5 -0.668129 -0.498210
point2 point4 0.618576 0.568692
point5 0.618576 0.568692
point3 point4 1.350509 1.629589
point5 1.350509 1.629589
df2 = df.reindex(index, level=1)
print (df2)
data1 data2
numerator denominator
point1 point4 0.301966 0.449483
point5 -0.345811 -0.315231
point2 point4 0.301966 0.449483
point5 -0.345811 -0.315231
point3 point4 0.301966 0.449483
point5 -0.345811 -0.315231
print (df1.div(df2))
data1 data2
numerator denominator
point1 point4 -2.212594 -1.108405
point5 1.932062 1.580459
point2 point4 2.048493 1.265214
point5 -1.788768 -1.804050
point3 point4 4.472386 3.625472
point5 -3.905339 -5.169509
df3 = df.reindex(index, level=0).div(df.reindex(index, level=1)).max(level=0)
print (df3)
data1 data2
numerator
point1 1.932062 1.580459
point2 2.048493 1.265214
point3 4.472386 3.625472
df3 = (df.reindex(index, level=0).div(df.reindex(index, level=1))
.max(level=0)
.reindex(index, level=0))
print (df3)
data1 data2
numerator denominator
point1 point4 1.932062 1.580459
point5 1.932062 1.580459
point2 point4 2.048493 1.265214
point5 2.048493 1.265214
point3 point4 4.472386 3.625472
point5 4.472386 3.625472