我有一个包含2列的数据框:“ emp”是子列,“ man”是父列。我需要计算任何给定父母的孩子总数(直接/间接)。
# -*- coding:utf-8 -*-
from PyQt5 import QtWidgets,QtGui,QtCore
import sys
qss = ""
class UI(QtWidgets.QMainWindow):
def __init__(self):
super().__init__()
self.setui()
def setui(self):
#----------main-window----------------------
self.setGeometry(0,0,1366,768) #x,y,w,h
self.setWindowTitle('hello world')
self.setWindowFlag(QtCore.Qt.FramelessWindowHint)
#----------menu-bar---------------------
#--------file-menu-----
self.menu_file=self.menuBar().addMenu('file')
self.menu_file_open=self.menu_file.addAction('open')
self.menu_file_save=self.menu_file.addAction('save')
self.menu_file_saveas=self.menu_file.addAction('save as...')
self.menu_file_quit=self.menu_file.addAction('exit')
#-----------experient-menu----------
self.menu_work=self.menuBar().addMenu('work')
#-------------analysis-menu---------
self.menu_analysis=self.menuBar().addMenu('analysis')
#------------edit-menu--------------
self.menu_edit=self.menuBar().addMenu('edit')
#------------window-menu--------------
self.menu_window=self.menuBar().addMenu('window')
#------------help---menu--------------
self.menu_help=self.menuBar().addMenu('help')
#-------------set---qss----------------------
self.setStyleSheet(qss)
#-------functions--connect-------------------
self.menu_file_quit.triggered.connect(QtWidgets.qApp.quit)
self.show()
if __name__ == '__main__':
app = QtWidgets.QApplication(sys.argv)
ex = UI()
sys.exit(app.exec_())
我正在寻找的解决方案是,例如,如果我想要有关213raj(11 *)的详细信息,则:
emp man
23ank(5*) 213raj(11*)
55man(5*) 213raj(11*)
2shu(1*) 23ank(5*)
7am(3*) 55man(5*)
9shi(0*) 55man(5*)
213raj(11*) 66sam(13*)
和213raj(11 *)= 5的总数。
如果我考虑使用66sam(13 *),那么:
213raj(11*),23ank(5*),2shu(1*),55man(5*),7am(3*),9shi(0*)
和总计数为66sam(13 *)= 6
我尝试了以下代码,但未获得所需的结果:
66sam(13*),213raj(11*),23ank(5*),2shu(1*),55man(5*),7am(3*),9shi(0*)
答案 0 :(得分:2)
用图论的术语来说,您有一个adjacency matrix形成了一个directed acyclic graph。
这是使用NetworkX图论库的解决方案。
import networkx as nx
emp_to_man = [
('23ank(5*)', '213raj(11*)'),
('55man(5*)', '213raj(11*)'),
('2shu(1*)', '23ank(5*)'),
('7am(3*)', '55man(5*)'),
('9shi(0*)', '55man(5*)'),
('213raj(11*)', '66sam(13*)'),
]
# Create a directed graph using the adjacency matrix.
# Converting a 2-column DF into a digraph is as easy as
# `nx.DiGraph(list(df.values))`.
g = nx.DiGraph(emp_to_man)
for emp in sorted(g): # For every employee (in sorted order for tidiness),
# ... print the set of ancestors (in no particular order).
# Should the adjacency matrix be `man_to_emp` instead, you'd use `
print(emp, nx.ancestors(g, emp))
这将打印出来
213raj(11*) {'55man(5*)', '7am(3*)', '2shu(1*)', '9shi(0*)', '23ank(5*)'}
23ank(5*) {'2shu(1*)'}
2shu(1*) set()
55man(5*) {'9shi(0*)', '7am(3*)'}
66sam(13*) {'213raj(11*)', '55man(5*)', '7am(3*)', '9shi(0*)', '2shu(1*)', '23ank(5*)'}
7am(3*) set()
9shi(0*) set()
编辑:如果性能至关重要,那么我衷心建议使用NetworkX方法。根据快速timeit
测试,发现所有员工的速度大约是基于Pandas的代码的62倍,并且每次调用都会将DF转换为NX网络。
编辑2 :令我感到非常惊讶的是,朴素的set / defaultdict图遍历仍然更快-比Pandas代码快387倍,比上面的Nx代码快5倍。>
def dag_count_all_children():
dag = collections.defaultdict(set)
for man, emp in df.values:
dag[emp].add(man)
out = {}
for man in set(dag):
found = set()
open = {man}
while open:
emp = open.pop()
open.update(dag[emp] - found)
found.update(dag[emp])
out[man] = found
return out
答案 1 :(得分:0)
如果我正确理解了您的问题,则此功能应为您提供正确的答案:
import pandas as pd
df = pd.DataFrame({'emp':['23ank(5*)', '55man(5*)', '2shu(1*)', '7am(3*)', '9shi(0*)', '213raj(11*)'],
'man':['213raj(11*)', '213raj(11*)', '23ank(5*)', '55man(5*)', '55man(5*)', '66sam(13*)']})
def count_children(parent):
total_children = [] # initialise list of children to append to
direct = df[df['man'] == parent]['emp'].to_list()
total_children += direct # add direct children
indirect = df[df['man'].isin(direct)]['emp'].to_list()
total_children += indirect # add indirect children
# next, add children of indirect children in a loop
next_indirect = indirect
while True:
next_indirect = df[df['man'].isin(next_indirect)]['emp'].to_list()
if not next_indirect or all(i in total_children for i in next_indirect):
break
else:
total_children = list(set(next_indirect).union(set(total_children)))
count = len(total_children)
return pd.DataFrame({'count':count,
'children':','.join(total_children)},
index=[parent])
count_children('213raj(11*)')
-> 5
count_children('66sam(13*)')
-> 6