将熊猫多索引切片彼此分开

时间:2020-06-15 13:44:39

标签: pandas slice multi-index

我有一个具有四个索引级别的熊猫多索引数据框。我正在尝试将此数据帧的一个片段除以同一数据帧的另一个片段。

    var myEventLogs = new List<myModels.EventLogEntry>();

    foreach (var eventLog in EventLog.GetEventLogs())
    {
        foreach (EventLogEntry entry in eventLog.Entries)
        {
            if (entry.Source.IndexOf("kernel-power", StringComparison.OrdinalIgnoreCase) == -1 &&
                entry.Message.IndexOf("kernel-power", StringComparison.OrdinalIgnoreCase) == -1)
                continue;

            var dataBuf = entry.GetPrivateField<byte[]>("dataBuf");
            var bufOffset = entry.GetPrivateField<int>("bufOffset");

            byte previousMagicByte = dataBuf[bufOffset + EVENTID + 3];
            try
            {
                dataBuf[bufOffset + EVENTID + 3] |= 0x02; //as strings in microsoft-windows-kernel-power-events.dll have 0x02****** ids

                myEventLogs.Add(new myModels.EventLogEntry(entry.Source, entry.Message))
            }
            finally
            {
                dataBuf[bufOffset + EVENTID + 3] = previousMagicByte;
            }
        }
    }
...


internal const int EVENTID = 20;

public static T GetPrivateField<T>(this object obj, string fieldName)
{
    if (fieldName == null)
        throw new ArgumentNullException(nameof(fieldName));

    var fieldInfo = obj.GetType().GetField(fieldName, BindingFlags.Instance | BindingFlags.NonPublic);

    if (fieldInfo == null)
        throw new ArgumentException($"Type {obj.GetType().FullName} doesn't have {fieldName} private instance field");

    object result = fieldInfo.GetValue(obj);
    return (T)result;
}

尽管单个切片会生成import pandas as pd df = pd.DataFrame( data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"], "indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"], "unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"], "year": ["2017", "2018","2019", "2017","2018","2019"], "country1": [1, 2, 3, 2, 4, 6], "country2": [4, 5, 6, 40, 50, 60]} ) df = df.set_index(["data_provider", "indicator", "unit", "year"], drop=True) print(df.loc[(slice(None), ["ind_a"]), :] / df.loc[(slice(None), ["ind_b"]), :]) 的有效切片,但这种简单的除法运算会得出所有NaN。如果我删除第一个索引级别并执行相同的切片和除法操作,则可以得到正确的结果。但是,df索引级将被删除,这很有意义。

indicator

最后,我想将除法结果附加到现有的df1.droplevel(0) print(df.loc["ind_a", :] / df.loc["ind_b", :]) 数据帧中。我需要分配多索引的前两个级别。类似于dfdata_provider="prov_a"。我该怎么办?

2 个答案:

答案 0 :(得分:3)

问题的根源是除法的两面都具有第一个值 在MultiIndex的1级。

因此,如果您删除此级别的索引然后执行除法:

res = df.loc[(slice(None), ["ind_a"]), :].droplevel([1]) / \
    df.loc[(slice(None), ["ind_b"]), :].droplevel([1])

您会得到正确的结果。

要将此结果附加到源DataFrame,请运行:

res2 = pd.concat([res], keys=['ind_c'], names=['indicator']).swaplevel(0,1)
df = pd.concat([df, res2])

结果是:

                                   country1  country2
data_provider indicator unit year                    
prov_a        ind_a     EUR  2017       1.0       4.0
                             2018       2.0       5.0
                             2019       3.0       6.0
              ind_b     EUR  2017       2.0      40.0
                             2018       4.0      50.0
                             2019       6.0      60.0
              ind_c     EUR  2017       0.5       0.1
                             2018       0.5       0.1
                             2019       0.5       0.1

答案 1 :(得分:1)

我会使用pd.IndexSliceto_numpy从除数中删除索引,因此,pandas不会强制数据对齐来划分数据框的相同形状部分:

import pandas as pd
df = pd.DataFrame(
    data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
          "indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
          "unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
          "year": ["2017", "2018","2019", "2017","2018","2019"],
          "country1": [1, 2, 3, 2, 4, 6],
          "country2": [4, 5, 6, 40, 50, 60]}
)
df = df.set_index(["data_provider", "indicator", "unit", "year"], drop=True)

indx = pd.IndexSlice
df_new = (df.loc[indx[:, 'ind_a'], :].div(df.loc[indx[:, 'ind_b'], :].to_numpy())
            .rename(index={'ind_a':'ind_c'}))
df_out = pd.concat([df,df_new])
print(df_out)

输出:

                                   country1  country2
data_provider indicator unit year                    
prov_a        ind_a     EUR  2017       1.0       4.0
                             2018       2.0       5.0
                             2019       3.0       6.0
              ind_b     EUR  2017       2.0      40.0
                             2018       4.0      50.0
                             2019       6.0      60.0
              ind_c     EUR  2017       0.5       0.1
                             2018       0.5       0.1
                             2019       0.5       0.1