如何按一列分组线?

时间:2014-07-23 10:58:53

标签: python

我想为这样的输入写一个函数tha

  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72

  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72

  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72

  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72

  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71

  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57

  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57

输出会给我两个按第一列分组的列表或文件,这意味着如果第一列中的数字相同,它将被分组为一个列表。结果应该是这样的:

列出一个:

  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72

  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72

  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72

  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72

列表二:

  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71

  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57

  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57

我不知道应该使用哪种方法。

3 个答案:

答案 0 :(得分:2)

您可以使用itertools.groupby()。 (假设输入按该列排序。)

示例:

import itertools

data = """\
  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72
  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72
  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72
  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72
  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71
  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57
  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57
"""

data = data.splitlines()
keyfunc = lambda x: x.split(',')[0]
#data.sort(key=keyfunc) # if input is not sorted by first column

for k,l in itertools.groupby(data, key=keyfunc):
    print "group:", k
    for x in l:
        print x

输出:

group:   1405684432
  1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72
  1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72
  1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72
  1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72
group:   1405684433
  1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71
  1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57
  1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57

供参考:

答案 1 :(得分:0)

我会选择使用字典来跟踪第一列。解决方案是使用类似的东西:

def split_on_first_column(data):
    result = dict()
    for line in data:
        l = line.split(',')
        if not l[0] in result:
            result[l[0]] = [line]
        else:
            result[l[0]].append(line)

    return result.values()

在python 2中,在这种情况下为你提供了一个列表列表,在python 3中为列表提供了一个迭代器。

请注意,这些行存储为完整字符串,不会进一步拆分为列表。

答案 2 :(得分:0)

  1. 将输入读取为CSV文件
  2. 使用第一列作为字典的键
  3. 输出字典
  4. Python代码:

    import csv
    
    groups = {}
    
    with open("data.csv") as data:
        reader = csv.reader(data)
        for row in reader:
            if len(row) > 0:
                col1 = row[0].strip()
                group = groups.get(col1, [])
                group.append(row)
                groups[col1] = group
    
    for key in groups:
        print("=== {0} ===".format(key))
        print("\n".join(",".join(row) for row in groups[key]))
    

    输出:

    === 1405684433 ===
    1405684433,        d8:c7:c8:5e:7c:29,           SUTD_ILP2,   71
    1405684433,        d8:c7:c8:5e:7d:eb,        SUTD_Student,   57
    1405684433,        d8:c7:c8:5e:7d:ea,          SUTD_Staff,   57
    === 1405684432 ===
    1405684432,        d8:c7:c8:5e:7c:2d,           SUTD_GLAB,   72
    1405684432,        d8:c7:c8:5e:7c:2c,            SUTD_BOT,   72
    1405684432,        d8:c7:c8:5e:7c:2b,        SUTD_Student,   72
    1405684432,        d8:c7:c8:5e:7c:2a,          SUTD_Staff,   72