Question

嗨我有这个数据帧，其中重复了SongIds。我正在尝试获取关联的ArtistId并创建一个包含2列的数据框 - Artist1，Artist2。

因此ex：for Song Id 0：

Artist1   Artist2

6169      2576

对于歌曲16和重复超过2行的情况我想要所有的排列

Artist1   Artist2

12992     2948

12992     9895

12992     5599

2948      9895

2948      5599

9895      5599

这是我的数据框内容的一个示例：

enter image description here

Answer 1

试试这个，

temp=df.groupby(['SongId']).apply(lambda x:pd.Series(list(itertools.combinations(x['ArtistId'].unique(),2)))).reset_index().rename(columns={0:'combinations'}).drop('level_1',axis=1)

temp[['ArtistId 1','ArtistId 2']] = pd.DataFrame(temp['combinations'].values.tolist(), index= temp.index)
print temp

输入：

    SongId  ArtistId
0        0      6169
1        0      2576
2       10      9161
3       10      2022
4       16     12992
5       16      2948
6       16      9895
7       16      5599
8       18      2286
9       18      1299
10      34      4844
11      34      5590
12      46      3530
13      46     10227
14      61      1471
15      61      1579

输出：

    SongId   combinations  ArtistId 1  ArtistId 2
0        0   (6169, 2576)        6169        2576
1       10   (9161, 2022)        9161        2022
2       16  (12992, 2948)       12992        2948
3       16  (12992, 9895)       12992        9895
4       16  (12992, 5599)       12992        5599
5       16   (2948, 9895)        2948        9895
6       16   (2948, 5599)        2948        5599
7       16   (9895, 5599)        9895        5599
8       18   (2286, 1299)        2286        1299
9       34   (4844, 5590)        4844        5590
10      46  (3530, 10227)        3530       10227
11      61   (1471, 1579)        1471        1579

说明：

查找每个组的唯一艺术家ID的组合。
将该组合转换为系列。
展开列表值以转换2列。

从其他数据帧排列创建数据框

1 个答案: