获取计数并报告人员详细信息

时间:2019-08-20 07:56:48

标签: python

我有一个包含2列的数据框:“ emp”是子列,“ man”是父列。我需要计算任何给定父母的孩子总数(直接/间接)。

# -*- coding:utf-8 -*- 

from PyQt5 import QtWidgets,QtGui,QtCore
import sys

qss = ""

class UI(QtWidgets.QMainWindow):
    def __init__(self):
        super().__init__()
        self.setui()  
    def setui(self):
        #----------main-window----------------------
        self.setGeometry(0,0,1366,768) #x,y,w,h
        self.setWindowTitle('hello world')
        self.setWindowFlag(QtCore.Qt.FramelessWindowHint)
        #----------menu-bar---------------------
        #--------file-menu-----
        self.menu_file=self.menuBar().addMenu('file')
        self.menu_file_open=self.menu_file.addAction('open')
        self.menu_file_save=self.menu_file.addAction('save')
        self.menu_file_saveas=self.menu_file.addAction('save as...')
        self.menu_file_quit=self.menu_file.addAction('exit')
        #-----------experient-menu----------
        self.menu_work=self.menuBar().addMenu('work')
        #-------------analysis-menu---------
        self.menu_analysis=self.menuBar().addMenu('analysis')
        #------------edit-menu--------------
        self.menu_edit=self.menuBar().addMenu('edit')
        #------------window-menu--------------
        self.menu_window=self.menuBar().addMenu('window')
        #------------help---menu--------------
        self.menu_help=self.menuBar().addMenu('help')
        #-------------set---qss----------------------
        self.setStyleSheet(qss)
        #-------functions--connect-------------------
        self.menu_file_quit.triggered.connect(QtWidgets.qApp.quit)
        self.show()
if __name__ == '__main__':
    app = QtWidgets.QApplication(sys.argv)
    ex = UI()
    sys.exit(app.exec_()) 

我正在寻找的解决方案是,例如,如果我想要有关213raj(11 *)的详细信息,则:

emp         man
23ank(5*)   213raj(11*)
55man(5*)   213raj(11*)
2shu(1*)    23ank(5*)
7am(3*)     55man(5*)
9shi(0*)    55man(5*)
213raj(11*) 66sam(13*)

和213raj(11 *)= 5的总数。

如果我考虑使用66sam(13 *),那么:

213raj(11*),23ank(5*),2shu(1*),55man(5*),7am(3*),9shi(0*)

和总计数为66sam(13 *)= 6

我尝试了以下代码,但未获得所需的结果:

66sam(13*),213raj(11*),23ank(5*),2shu(1*),55man(5*),7am(3*),9shi(0*)

2 个答案:

答案 0 :(得分:2)

用图论的术语来说,您有一个adjacency matrix形成了一个directed acyclic graph

这是使用NetworkX图论库的解决方案。

import networkx as nx

emp_to_man = [
 ('23ank(5*)', '213raj(11*)'),
 ('55man(5*)', '213raj(11*)'),
 ('2shu(1*)', '23ank(5*)'),
 ('7am(3*)', '55man(5*)'),
 ('9shi(0*)', '55man(5*)'),
 ('213raj(11*)', '66sam(13*)'),
]

# Create a directed graph using the adjacency matrix.
# Converting a 2-column DF into a digraph is as easy as
# `nx.DiGraph(list(df.values))`.
g = nx.DiGraph(emp_to_man)

for emp in sorted(g):  # For every employee (in sorted order for tidiness),
     # ... print the set of ancestors (in no particular order).
     # Should the adjacency matrix be `man_to_emp` instead, you'd use `
     print(emp, nx.ancestors(g, emp))

这将打印出来

213raj(11*) {'55man(5*)', '7am(3*)', '2shu(1*)', '9shi(0*)', '23ank(5*)'}
23ank(5*) {'2shu(1*)'}
2shu(1*) set()
55man(5*) {'9shi(0*)', '7am(3*)'}
66sam(13*) {'213raj(11*)', '55man(5*)', '7am(3*)', '9shi(0*)', '2shu(1*)', '23ank(5*)'}
7am(3*) set()
9shi(0*) set()

编辑:如果性能至关重要,那么我衷心建议使用NetworkX方法。根据快速timeit测试,发现所有员工的速度大约是基于Pandas的代码的62倍,并且每次调用都会将DF转换为NX网络。

编辑2 :令我感到非常惊讶的是,朴素的set / defaultdict图遍历仍然更快-比Pandas代码快387倍,比上面的Nx代码快5倍。

def dag_count_all_children():
    dag = collections.defaultdict(set)
    for man, emp in df.values:
        dag[emp].add(man)
    out = {}

    for man in set(dag):
        found = set()
        open = {man}
        while open:
            emp = open.pop()
            open.update(dag[emp] - found)
            found.update(dag[emp])

        out[man] = found
    return out

答案 1 :(得分:0)

如果我正确理解了您的问题,则此功能应为您提供正确的答案:

import pandas as pd

df = pd.DataFrame({'emp':['23ank(5*)', '55man(5*)', '2shu(1*)', '7am(3*)', '9shi(0*)', '213raj(11*)'],
                   'man':['213raj(11*)', '213raj(11*)', '23ank(5*)', '55man(5*)', '55man(5*)', '66sam(13*)']})


def count_children(parent):
    total_children = []  # initialise list of children to append to
    direct = df[df['man'] == parent]['emp'].to_list()
    total_children += direct  # add direct children

    indirect = df[df['man'].isin(direct)]['emp'].to_list()
    total_children += indirect  # add indirect children

    # next, add children of indirect children in a loop
    next_indirect = indirect
    while True:
        next_indirect = df[df['man'].isin(next_indirect)]['emp'].to_list()
        if not next_indirect or all(i in total_children for i in next_indirect):
            break
        else:
            total_children = list(set(next_indirect).union(set(total_children)))

    count = len(total_children)
    return pd.DataFrame({'count':count,
                     'children':','.join(total_children)},
                     index=[parent])

count_children('213raj(11*)')-> 5

count_children('66sam(13*)')-> 6