如何从熊猫数据帧创建元组的多个列表

时间:2019-11-22 11:08:17

标签: python pandas list tuples

我有一个熊猫数据框,其中包含三列。我想根据Project Column中的值创建一个元组的多个列表

print (df)
   Project  Resource  Time
0       P1         0     4
1       P1         2     4
2       P1         1    10
3       P1         3     3
4       P2         1     3
5       P2         3    10
6       P2         0    11
7       P2         2     3
8       P2         0    12
9       P2         3    11
10      P2         1     3
11      P2         2     3
12      P3         0    12

列出要创建的元组,如下所示 [[(0,4),(2,4),(1,10),(3,3)],[(1,3),(3,10),(0,11),(2,3 ),(0,12),(3,11),(1,3),(2,3)],[(0,12)]]

我使用了以下代码

tuples = [tuple(x) for x in data.values]

5 个答案:

答案 0 :(得分:2)

DataFrame.groupby与lambda函数和zip一起使用,最后将输出Series转换为list

t  = df.groupby('Project').apply(lambda x: list(zip(x['Resource'], x['Time']))).tolist()
print (t)
[[(0, 4), (2, 4), (1, 10), (3, 3)], 
 [(1, 3), (3, 10), (0, 11), (2, 3), (0, 12), (3, 11), (1, 3), (2, 3)],
 [(0, 12)]]

另一种解决方案:

t  = (df.groupby('Project')['Resource','Time']
        .apply(lambda x: [tuple(y) for y in x.values])
        .tolist())

答案 1 :(得分:1)

您可以使用zip函数遍历熊猫数据框的几列:

df = pd.DataFrame({"ressource":[0,2, 1,3], "time":[4,4, 10, 3]})

tuples = [(x,y) for x,y in zip(df['ressource'], df['time'])]

输出:

[(0, 4), (2, 4), (1, 10), (3, 3)]

答案 2 :(得分:1)

尝试一下:

>>> df['zip'] = tuple(zip(df.Resource, df.Time))
>>> df.groupby('Project').agg(lambda x:list(x))['zip'].tolist()
[[(0, 4), (2, 4), (1, 10), (3, 3)],
 [(1, 3), (3, 10), (0, 11), (2, 3), (0, 12), (3, 11), (1, 3), (2, 3)],
 [(0, 12)]]

答案 3 :(得分:0)

怎么样呢?

listExample=[]
for code in tmpa.loc[:, 'Project'].unique():
   listExample.append([(a, b) for a, b in tmpa[tmpa.loc[:, 'Project']==code].loc[:, ['Resource', 'Time']].values])

这不是很漂亮,但是我认为它应该可以工作。

答案 4 :(得分:0)

如果要按项目划分元组,请执行以下操作:

#Create tuples column 
df['Tuples'] = df.apply(lambda r: (r['Resource'], r['Time']), axis=1)

# Concatenate tuples grouped by 'Project'
result_df = df[['Project', 'Tuples']].groupby('Project').agg(list)

结果是:

                                            Tuples
Project                                                   
P1                       [(0, 4), (2, 4), (1, 10), (3, 3)]
P2       [(1, 3), (3, 10), (0, 11), (2, 3), (0, 12), (3...
P3                                               [(0, 12)]

然后,您可以重置索引以使“项目”列返回:

result_df.reset_index(inplace=True)
result_df

 Project                                             Tuples
0      P1                  [(0, 4), (2, 4), (1, 10), (3, 3)]
1      P2  [(1, 3), (3, 10), (0, 11), (2, 3), (0, 12), (3...
2      P3                                          [(0, 12)]