根据Pandas中的索引列合并元素

时间:2019-11-13 09:42:46

标签: pandas dataframe pandas-groupby

我有以下数据框:

index | element | relation_index

1       dog          0
2       cat          0
3       crow         1
4       snake        3
5       pig          1
6       porcupine    0 
7       weasel       2
8       bear         3

我想获得:

index |            element            | relation_index

1       dog, crow, pig, snake, bear          0
2       cat, weasel                          0
3       dog, crow, pig, snake, bear          1
4       dog, crow, pig, snake, bear          3
5       dog, crow, pig, snake, bear          1
6       porcupine                            0 
7       cat, weasel                          2
8       dog, crow, pig, snake, bear          3

所以规则是:

  • 将所有元素与通用的indexrelation_index
  • 合并在一起
  • 忽略relation_index为0的行

对于大型数据帧,如何有效地做到这一点?

编辑:我忘了一件事,element数据类型应该只是一个字符串。

"dog, crow, pig, snake, bear"

1 个答案:

答案 0 :(得分:1)

我将iterrowsfor loop一起使用来解决此问题。

# Rename index to id, prevent pandas error
df.rename(columns={'index': 'id'}, inplace=True)

# Create a parent group
parent = df[df.relation_index == 0].copy()
search_df = df[df.relation_index != 0].copy()

group_index = [[i] for i in parent.id.tolist()]
group_name = [[i] for i in parent.element.tolist()]

print(group_index)
print(group_name)
[[1], [2], [6]]
[['dog'], ['cat'], ['porcupine']]

# Assign group to each id
for _, row in search_df.iterrows():
    new_group = True
    for i in range(len(group_index)):
        if row.relation_index in group_index[i]:
            group_index[i].append(row.id)
            group_name[i].append(row.element)
            new_group = False
            break

    if new_group:
        group_index.append([row.id])
        group_name.append([row.element]) 

print(group_index)
print(group_name)
[[1, 3, 4, 5, 8], [2, 7], [6]]
[['dog', 'crow', 'snake', 'pig', 'bear'], ['cat', 'weasel'], ['porcupine']]

# Assign result back to main df
result = []
for _, row in df.iterrows():
    has_group = False
    for i in range(len(group_index)):
        if row.id in group_index[i]:
            result.append(", ".join(group_name[i]))
            has_group = True
    if not has_group:
        result.append(None)

df['result'] = result
df

   id    element  relation_index                       result
0   1        dog               0  dog, crow, snake, pig, bear
1   2        cat               0                  cat, weasel
2   3       crow               1  dog, crow, snake, pig, bear
3   4      snake               3  dog, crow, snake, pig, bear
4   5        pig               1  dog, crow, snake, pig, bear
5   6  porcupine               0                    porcupine
6   7     weasel               2                  cat, weasel
7   8       bear               3  dog, crow, snake, pig, bear