Question

我有像这样的多索引 pandas 数据框

# df =
                      val
date       id          
2021-01-01 whatever1  0
           whatever2  1
           whatever3  0
           whatever4  3
           whatever5  2
2021-01-02 whatever2  0
           whatever7  3
2021-01-03 whatever3  0
           whatever4  0
...

我希望计算第一个索引下不同值的出现次数，如下所示，

            0 1 2 3

2021-01-01  2 1 1 1
2021-01-02  1 0 0 1
2021-01-03  2 0 0 0
...

我该怎么办？我最好的尝试是：

df.groupby(by='date', level=0).agg([lambda x: [np.count_nonzero(x==i) for i in range(df.values.max())]])

# result = 
                     val
                <lambda>
data                    
2021-01-01  [2, 1, 1, 1]
2021-01-02  [1, 0, 0, 1]
2021-01-03  [2, 0, 0, 0]

Answer 1

我认为最简单的方法是使用交叉制表：

Dictionary<int, int> horas = new Dictionary<int, int>();
for(int i = 0; i < horacitas.Count; i++)
{
    int counter = 1;
    while(i < horacitas.Count - 1 && horacitas[i+1] == horacitas[i]+1)
    {
        counter++;
        i++;
    }
    horas.Add(horacitas[i], counter);
}

大熊猫数据框中的聚合和计数

1 个答案: