Pandas groupby两列,包括每组第2列的所有可能值

时间:2018-01-28 10:22:01

标签: python pandas

我确定这是重复但我找不到。

我有这个数据框:

    @Test
public void compose() throws InterruptedException {
    Scheduler mainThread = Schedulers.single();
    Flux.just(("old element"))
            .compose(element ->
                    Flux.just("new element in new thread")
                            .subscribeOn(mainThread)
                            .doOnNext(value -> System.out.println("Thread:" + Thread.currentThread().getName())))
            .doOnNext(value -> System.out.println("Thread:" + Thread.currentThread().getName()))
            .subscribe(System.out::println);
    Thread.sleep(1000);
}

@Test
public void flatMapVsCompose() throws InterruptedException {
    Scheduler mainThread = Schedulers.single();
    Flux.just(("old element"))
            .flatMap(element ->
                    Flux.just("new element in new thread")
                            .subscribeOn(mainThread)
                            .doOnNext(value -> System.out.println("Thread:" + Thread.currentThread().getName())))
            .doOnNext(value -> System.out.println("Thread:" + Thread.currentThread().getName()))
            .subscribe(System.out::println);
    Thread.sleep(1000);
}

我想分组国家和班级并找到他们的总和,所以我尝试:

import pandas as pd

df = pd.DataFrame(data=[['Sweden','A',5],
                        ['Sweden','A',10],
                        ['Norway','B',4],
                        ['Norway','C',5]],
                  columns=['Country','Class','Value'])
print(df)

  Country Class  Value
0  Sweden     A      5
1  Sweden     A     10
2  Norway     B      4
3  Norway     C      5

但我希望每个国家/地区都包含所有可能的类,例如

df.groupby(['Country','Class']).sum()
               Value
Country Class       
Norway  B          4
        C          5
Sweden  A         15

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:4)

选项1
unstack然后再次stack

df.groupby(['Country','Class']).sum().unstack().stack(dropna=False)

               Value
Country Class       
Norway  A        NaN
        B        4.0
        C        5.0
Sweden  A       15.0
        B        NaN
        C        NaN

选项2
另一种选择是reindex使用构造的MultiIndex

v = df.groupby(['Country','Class']).sum()
idx = pd.MultiIndex.from_product([df.Country.unique(), df.Class.unique()])

v.reindex(idx)

          Value
Sweden A   15.0
       B    NaN
       C    NaN
Norway A    NaN
       B    4.0
       C    5.0