按Python中的唯一值对列进行分组

时间:2017-06-17 15:45:04

标签: python python-2.7 pandas dataframe

我有一个包含两列的数据集,我需要从这种格式更改它:

10  1 
10  5
10  3
11  5
11  4
12  6
12  2

到这个

10  1  5  3
11  5  4
12  6  2

我需要第一列中的每个唯一值都在它自己的行上。

我是Python的初学者,除了阅读我的文本文件外,我还不知道如何继续。

5 个答案:

答案 0 :(得分:3)

您可以使用Pandas数据帧。

import pandas as pd

df = pd.DataFrame({'A':[10,10,10,11,11,12,12],'B':[1,5,3,5,4,6,2]})
print(df)

输出:

    A  B
0  10  1
1  10  5
2  10  3
3  11  5
4  11  4
5  12  6
6  12  2

让我们使用groupbyjoin

df.groupby('A')['B'].apply(lambda x:' '.join(x.astype(str)))

输出:

A
10    1 5 3
11      5 4
12      6 2
Name: B, dtype: object

答案 1 :(得分:1)

仅使用itertools.groupby的示例;这都在python标准库中(尽管pandas version更简洁!)。

假设您想要分组的密钥相邻,这可能都是懒惰地完成(不需要随时在内存中存储所有数据):

from io import StringIO
from itertools import groupby

text = '''10  1
10  5
10  3
11  5
11  4
12  6
12  2'''

# read and group data:
with StringIO(text) as file:
    keys = []
    res = {}

    data = (line.strip().split() for line in file)

    for k, g in groupby(data, key=lambda x: x[0]):
        keys.append(k)
        res[k] = [item[1] for item in g]

print(keys)  # ['10', '11', '12']
print(res)   # {'12': ['6', '2'], '10': ['1', '5', '3'], '11': ['5', '4']}

# write grouped data:
with StringIO() as out_file:
    for key in keys:
        out_file.write('{:3s}'.format(key))
        out_file.write(' '.join(['{:3s}'.format(item) for item in res[key]]))
        out_file.write('\n')
    print(out_file.getvalue())
    # 10 1   5   3
    # 11 5   4
    # 12 6   2

然后,您可以将with StringIO(text) as file:替换为类似with open('infile.txt', 'r') as file的程序,以便程序读取您的实际文件(类似于输出文件open('outfile.txt', 'w'))。

再次

:当然每次找到密钥时都可以直接写入输出文件;这样,您就不需要随时将所有数据都存储在内存中:

with StringIO(text) as file, StringIO() as out_file:

    data = (line.strip().split() for line in file)

    for k, g in groupby(data, key=lambda x: x[0]):
        out_file.write('{:3s}'.format(k))
        out_file.write(' '.join(['{:3s}'.format(item[1]) for item in g]))
        out_file.write('\n')

    print(out_file.getvalue())

答案 2 :(得分:1)

使用collections.defaultdict子类:

import collections
with open('yourfile.txt', 'r') as f:
    d = collections.defaultdict(list)
    for k,v in (l.split() for l in f.read().splitlines()):  # processing each line
        d[k].append(v)             # accumulating values for the same 1st column
    for k,v in sorted(d.items()):  # outputting grouped sequences
        print('%s  %s' % (k,'  '.join(v)))

输出:

10  1  5  3
11  5  4
12  6  2

答案 3 :(得分:0)

使用 $( "table > td" ).find( ".item-id" ).val(); or <?php while($row = mysqli_fetch_array($result)) { echo "<tr id='row'>"; echo "<td class='item-id'>" . $row['id'] . "</td>"; echo "<td>" . $row['nome']. "</td>"; echo "<td>" . $row['email']. "</td>"; echo "</tr>"; } ?> <script> var id = []; var values = []; $("#row > .item-id").each(function(index){ id.push($(this).attr("id")); // [a,b,c,....] values.push($(this).text()); //[ Dummy1, Dummy2, Dummy3, Dummy4,..] }); </script> 可能会更容易。您可以使用pandas函数读取read_csv文件,其中数据以空格或空格分隔。

txt

这会将import pandas as pd df = pd.read_csv("input.txt", header=None, delimiter="\s+") # setting column names df.columns = ['col1', 'col2'] df 的输出显示为:

dataframe

col1 col2 0 10 1 1 10 5 2 10 3 3 11 5 4 11 4 5 12 6 6 12 2 文件读取到txt后,类似于之前其他answer中的dataframe,您还可以使用applyaggregate

join

输出:

df_combine = df.groupby('col1')['col2'].agg(lambda col: ' '.join(col.astype('str'))).reset_index()
df_combine

答案 4 :(得分:0)

我使用dictonaries找到了这个解决方案:

with open("data.txt", encoding='utf-8') as data:
    file = data.readlines()

    dic = {}
    for line in file:
        list1 = line.split()
        try:
            dic[list1[0]] += list1[1] + ' '
        except KeyError:
            dic[list1[0]] = list1[1] + ' '

    for k,v in dic.items():
        print(k,v)

<强> 输出

  

10 1 5 3

     

11 5 4

     

12 6 2

更实用的功能

def getdata(datafile):
    with open(datafile, encoding='utf-8') as data:
        file = data.readlines()

    dic = {}
    for line in file:
        list1 = line.split()
        try:
            dic[list1[0]] += list1[1] + ' '
        except KeyError:
            dic[list1[0]] = list1[1] + ' '

    for k,v in dic.items():
        v = v.split()
        print(k, ':',v)

getdata("data.txt")

<强> 输出

  

11:['5','4']

     

12:['6','2']

     

10:['1','5','3']