我有一个具有四个索引级别的熊猫多索引数据框。我正在尝试将此数据帧的一个片段除以同一数据帧的另一个片段。
var myEventLogs = new List<myModels.EventLogEntry>();
foreach (var eventLog in EventLog.GetEventLogs())
{
foreach (EventLogEntry entry in eventLog.Entries)
{
if (entry.Source.IndexOf("kernel-power", StringComparison.OrdinalIgnoreCase) == -1 &&
entry.Message.IndexOf("kernel-power", StringComparison.OrdinalIgnoreCase) == -1)
continue;
var dataBuf = entry.GetPrivateField<byte[]>("dataBuf");
var bufOffset = entry.GetPrivateField<int>("bufOffset");
byte previousMagicByte = dataBuf[bufOffset + EVENTID + 3];
try
{
dataBuf[bufOffset + EVENTID + 3] |= 0x02; //as strings in microsoft-windows-kernel-power-events.dll have 0x02****** ids
myEventLogs.Add(new myModels.EventLogEntry(entry.Source, entry.Message))
}
finally
{
dataBuf[bufOffset + EVENTID + 3] = previousMagicByte;
}
}
}
...
internal const int EVENTID = 20;
public static T GetPrivateField<T>(this object obj, string fieldName)
{
if (fieldName == null)
throw new ArgumentNullException(nameof(fieldName));
var fieldInfo = obj.GetType().GetField(fieldName, BindingFlags.Instance | BindingFlags.NonPublic);
if (fieldInfo == null)
throw new ArgumentException($"Type {obj.GetType().FullName} doesn't have {fieldName} private instance field");
object result = fieldInfo.GetValue(obj);
return (T)result;
}
尽管单个切片会生成import pandas as pd
df = pd.DataFrame(
data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
"indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
"unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
"year": ["2017", "2018","2019", "2017","2018","2019"],
"country1": [1, 2, 3, 2, 4, 6],
"country2": [4, 5, 6, 40, 50, 60]}
)
df = df.set_index(["data_provider", "indicator", "unit", "year"], drop=True)
print(df.loc[(slice(None), ["ind_a"]), :] / df.loc[(slice(None), ["ind_b"]), :])
的有效切片,但这种简单的除法运算会得出所有NaN。如果我删除第一个索引级别并执行相同的切片和除法操作,则可以得到正确的结果。但是,df
索引级将被删除,这很有意义。
indicator
最后,我想将除法结果附加到现有的df1.droplevel(0)
print(df.loc["ind_a", :] / df.loc["ind_b", :])
数据帧中。我需要分配多索引的前两个级别。类似于df
和data_provider="prov_a"
。我该怎么办?
答案 0 :(得分:3)
问题的根源是除法的两面都具有第一个值 在MultiIndex的1级。
因此,如果您删除此级别的索引然后执行除法:
res = df.loc[(slice(None), ["ind_a"]), :].droplevel([1]) / \
df.loc[(slice(None), ["ind_b"]), :].droplevel([1])
您会得到正确的结果。
要将此结果附加到源DataFrame,请运行:
res2 = pd.concat([res], keys=['ind_c'], names=['indicator']).swaplevel(0,1)
df = pd.concat([df, res2])
结果是:
country1 country2
data_provider indicator unit year
prov_a ind_a EUR 2017 1.0 4.0
2018 2.0 5.0
2019 3.0 6.0
ind_b EUR 2017 2.0 40.0
2018 4.0 50.0
2019 6.0 60.0
ind_c EUR 2017 0.5 0.1
2018 0.5 0.1
2019 0.5 0.1
答案 1 :(得分:1)
我会使用pd.IndexSlice
和to_numpy
从除数中删除索引,因此,pandas不会强制数据对齐来划分数据框的相同形状部分:
import pandas as pd
df = pd.DataFrame(
data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
"indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
"unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
"year": ["2017", "2018","2019", "2017","2018","2019"],
"country1": [1, 2, 3, 2, 4, 6],
"country2": [4, 5, 6, 40, 50, 60]}
)
df = df.set_index(["data_provider", "indicator", "unit", "year"], drop=True)
indx = pd.IndexSlice
df_new = (df.loc[indx[:, 'ind_a'], :].div(df.loc[indx[:, 'ind_b'], :].to_numpy())
.rename(index={'ind_a':'ind_c'}))
df_out = pd.concat([df,df_new])
print(df_out)
输出:
country1 country2
data_provider indicator unit year
prov_a ind_a EUR 2017 1.0 4.0
2018 2.0 5.0
2019 3.0 6.0
ind_b EUR 2017 2.0 40.0
2018 4.0 50.0
2019 6.0 60.0
ind_c EUR 2017 0.5 0.1
2018 0.5 0.1
2019 0.5 0.1