我有以下数据框:
index | element | relation_index
1 dog 0
2 cat 0
3 crow 1
4 snake 3
5 pig 1
6 porcupine 0
7 weasel 2
8 bear 3
我想获得:
index | element | relation_index
1 dog, crow, pig, snake, bear 0
2 cat, weasel 0
3 dog, crow, pig, snake, bear 1
4 dog, crow, pig, snake, bear 3
5 dog, crow, pig, snake, bear 1
6 porcupine 0
7 cat, weasel 2
8 dog, crow, pig, snake, bear 3
所以规则是:
index
或relation_index
relation_index
为0的行对于大型数据帧,如何有效地做到这一点?
编辑:我忘了一件事,element
数据类型应该只是一个字符串。
"dog, crow, pig, snake, bear"
答案 0 :(得分:1)
我将iterrows
与for loop
一起使用来解决此问题。
# Rename index to id, prevent pandas error
df.rename(columns={'index': 'id'}, inplace=True)
# Create a parent group
parent = df[df.relation_index == 0].copy()
search_df = df[df.relation_index != 0].copy()
group_index = [[i] for i in parent.id.tolist()]
group_name = [[i] for i in parent.element.tolist()]
print(group_index)
print(group_name)
[[1], [2], [6]]
[['dog'], ['cat'], ['porcupine']]
# Assign group to each id
for _, row in search_df.iterrows():
new_group = True
for i in range(len(group_index)):
if row.relation_index in group_index[i]:
group_index[i].append(row.id)
group_name[i].append(row.element)
new_group = False
break
if new_group:
group_index.append([row.id])
group_name.append([row.element])
print(group_index)
print(group_name)
[[1, 3, 4, 5, 8], [2, 7], [6]]
[['dog', 'crow', 'snake', 'pig', 'bear'], ['cat', 'weasel'], ['porcupine']]
# Assign result back to main df
result = []
for _, row in df.iterrows():
has_group = False
for i in range(len(group_index)):
if row.id in group_index[i]:
result.append(", ".join(group_name[i]))
has_group = True
if not has_group:
result.append(None)
df['result'] = result
df
id element relation_index result
0 1 dog 0 dog, crow, snake, pig, bear
1 2 cat 0 cat, weasel
2 3 crow 1 dog, crow, snake, pig, bear
3 4 snake 3 dog, crow, snake, pig, bear
4 5 pig 1 dog, crow, snake, pig, bear
5 6 porcupine 0 porcupine
6 7 weasel 2 cat, weasel
7 8 bear 3 dog, crow, snake, pig, bear