我创建了以下数据框
import pandas as pd
df = pd.DataFrame({'parent': ['AC1', 'AC2', 'AC3', 'AC1', 'AC11', 'AC5', 'AC5', 'AC6', 'AC8', 'AC9'],
'child': ['AC2', 'AC3', 'AC4', 'AC11', 'AC12', 'AC2', 'AC6', 'AC7', 'AC9', 'AC10']})
输出以下内容:
parent child
0 AC1 AC2
1 AC2 AC3
2 AC3 AC4
3 AC1 AC11
4 AC11 AC12
5 AC5 AC2
6 AC5 AC6
7 AC6 AC7
8 AC8 AC9
9 AC9 AC10
我想创建一个结果数据框,其中每个父级(意味着它在子级列中不存在)列出了最后的子级。
df_result = pd.DataFrame({'parent': ['AC1', 'AC1', 'AC5', 'AC5', 'AC8', 'AC2'],
'child': ['AC4', 'AC12', 'AC4', 'AC7', 'AC10', 'AC4']})
parent child
0 AC1 AC4
1 AC1 AC12
2 AC5 AC4
3 AC5 AC7
4 AC8 AC10
5 AC2 AC4
我已经启动了以下功能,但不确定如何完成该功能。
def get_child(df):
result = {}
if df['parent'] not in df['child']:
return result[df['parent']]
答案 0 :(得分:1)
这是树结构,一种特殊的图形。数据帧并不是表示树的一种特别方便的方法。我建议您切换到networkx
或其他基于图形的软件包。然后查找如何进行简单的路径遍历;您可以在图形包文档中找到直接支持。
如果您坚持要自己执行此操作(这是合理的编程练习),则只需要类似此伪代码的
for each parent not in "child" column:
here = parent
while here in parent column:
here = here["child"]
record (parent, here) pair
答案 1 :(得分:0)
虽然您的预期输出似乎与您的描述不一致(AC2似乎不应该视为父级,因为它不是源节点),但我非常有信心您希望运行{{3} }从每个源节点定位到其所有叶子。在数据框中执行此操作并不方便,因此我们可以使用 <label for="id_10139347"> CHECK BOX </label>
<input class="suscriptionCheck" id="id_10139347" type="checkbox" name="id_10139347">
<br/>
<button>SOME BUTTON</button>
button {
display: none;
}
.suscriptionCheck:checked ~ button{
display:block;
padding: 14px;
}
并创建一个traversal字典来表示图形。我认为图中没有周期。
df.values
输出:
import pandas as pd
from collections import defaultdict
def find_leaves(graph, src):
if src in graph:
for neighbor in graph[src]:
yield from find_leaves(graph, neighbor)
else:
yield src
def pair_sources_to_leaves(df):
graph = defaultdict(list)
children = set()
for parent, child in df.values:
graph[parent].append(child)
children.add(child)
leaves = [[x, list(find_leaves(graph, x))]
for x in graph if x not in children]
return (pd.DataFrame(leaves, columns=df.columns)
.explode(df.columns[-1])
.reset_index(drop=True))
if __name__ == "__main__":
df = pd.DataFrame({
"parent": ["AC1", "AC2", "AC3", "AC1", "AC11",
"AC5", "AC5", "AC6", "AC8", "AC9"],
"child": ["AC2", "AC3", "AC4", "AC11", "AC12",
"AC2", "AC6", "AC7", "AC9", "AC10"]
})
print(pair_sources_to_leaves(df))