我有一个数据框,该数据框按彼此相同的行分组,并相应地列出了值。
Company Who Dates
0 DE BORTOLI WINES DIXONS CREEK 1/02/2020
1 DE BORTOLI WINES DIXONS GREEK 1/02/2020
2 DE BORTOLI WINES DIXONS CREEK 1/03/2020
3 DE BORTOLI WINES BILBUL 1/05/2020
4 Ezard@Levantine Hill Coldstream 1/06/2020
5 Ezard@LevantineHill Hotstream 1/10/2020
6 RATHBONE WINE GROUP PORT MELBOURN 1/02/2020
7 YERING STATION YARRA GLEN 1/05/2020
8 YERING STATION YARRA GREEN 1/01/2020
这样做:
sorted_ = df["Dates"].groupby(df["Company"].ne(df["Company"].shift()).cumsum()).apply(list)
我可以得到同一公司的日期列表。
如果我这样做
sorted_ = df["Who"].groupby(df["Company"].ne(df["Company"].shift()).cumsum()).apply(list)
我可以获得同一公司的谁列表。
类似
[DIXONS CREEK, DIXONS GREEK, DIXONS CREEK, BILBUL]
[Coldstream, Hotstream]
[PORT MELBOURN]
[YARRA GLEN, YARRA GREEN]
问题是,在一个非常大的数据集中,我真的不知道他们属于哪个公司。我怎样才能看到他们分组的公司?
理想的结果:
Company Result
DE BORTOLI WINES [DIXONS CREEK, DIXONS GREEK, DIXONS CREEK, BILBUL]
Ezard@Levantine Hill [Coldstream, Hotstream]
RATHBONE WINE GROUP [PORT MELBOURN]
YERING STATION [YARRA GLEN, YARRA GREEN]