Python对值进行排序并根据唯一键对它们进行分组

时间:2019-05-10 05:51:23

标签: python arrays list sorting multiple-columns

我有一个元组元素列表,如下所示。我想将元素分组为多维行和列。例如:

说列表是“列表”:

[("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir")]

我想将此列表打印为:

------------------------------------------
Name  | AlaskanAir | DeltaAir | DragonAir
------------------------------------------
Adam                    *
Bianca      *
Romeo                   *
Danaerys                            *
Jon                                 *
Walter      *
------------------------------------------

我首先要找到所有我想要作为行标题的唯一元素。

    row=[]
    for i in list:
        row.append(i[1])
    row = list(set(row))

然后,我将遍历“行”中的元素,然后构建表。我如何轻松构建它? 谢谢!

2 个答案:

答案 0 :(得分:3)

我们可以使用pandas来做到这一点:

import pandas as pd

df = pd.DataFrame([("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir")], columns=['name', 'value'])

result = pd.get_dummies(df, columns=['value']).rename(columns={f'value_{col}': col for col in df['value'].unique()}).replace({col: {0: '', 1: '*'} for col in df['value'].unique()})

print(result)

输出:

       name AlaskanAir DeltaAir DragonAir
0      Adam                   *          
1    Bianca          *                   
2     Romeo                   *          
3  Danaerys                             *
4       Jon                             *
5    Walter          *                   

这会将与每个个体对应的值转换为相关列中的1或0。然后,我们只用*替换1,用空字符串替换0。

请注意,{em>逻辑不需要pandas,这很容易完成,但是 便于表格对齐。

答案 1 :(得分:2)

lst = [("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir")]

#Create pandas DataFrame with the names from the list
df = pd.DataFrame([elem[0] for elem in lst], columns=["Name"])
#Iterate over a set (unique values) of character properties (DeltaAir, AlaskanAir, DragonAir) 
for elem in set([elem[1] for elem in lst]):
    #make a list containing spaces or *'s for every character in the list
    #depending on the property we are just iterating over and add that list 
    # as a column to the DataFrame
    df[elem] = ["*" if item[1] == elem else " " for item in lst] 

编辑您的评论:

您可以使用groupby和合计按名称组合值(如果那不是您的意思,请说明)。

df.reindex(sorted(df.columns))
df2 = pd.DataFrame(sorted(list(df["Name"].unique())), columns = ["Name"])
for elem in set([elem[1] for elem in lst]):   
    df2[elem] = list(df.groupby(['Name'])[elem].agg(lambda x: "*" if "*" in x.values else " "))

添加更多信息

感谢弗洛里安。我的意思是,如果有重复的姓名,如下所示,则应适当地填入相应的航空公司行。例如:亚当和罗密欧会出现两次,而不是用两个单独的行来表示相同的名字。

[("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir"),
("Adam", "AlaskanAir"),
("Romeo", "DragonAir")]

------------------------------------------
Name  | AlaskanAir | DeltaAir | DragonAir
------------------------------------------
Adam        *           *
Bianca      *
Romeo                   *           *
Danaerys                            *
Jon                                 *
Walter      *
------------------------------------------