我遇到了从CSV文件实现树数据结构的问题。我有一个CSV文件Graph.csv
,其中包含5列
SourceCompetitor,Name,ChoppedHostname,GeoRegion,GeoCountry
CISCO,IBM INDIA PVT. LTD.(BANGLADESH) ,ibm,APAC,BANGLADESH
CISCO,Square InformatiX Limited(BANGLADESH) ,e-home2u,APAC,BANGLADESH
CISCO,Thakral Information Systems Private Limited(BANGLADESH) ,thakral,APAC,BANGLADESH
CISCO,Sysnet Bangladesh Limited(BANGLADESH) ,sysnetgroup,APAC,BANGLADESH
CISCO,Onepower Infotech Ltd(BANGLADESH) ,onepowerbd,APAC,BANGLADESH
CISCO,Firoza Technology Limited(BANGLADESH) ,firozatech,APAC,BANGLADESH
CISCO,South Bangla Computers(BANGLADESH) ,southbanglabd,APAC,BANGLADESH
CISCO,Endeavour Technologies Ltd(BANGLADESH) ,etlbd,APAC,BANGLADESH
CISCO,Aftab IT Limited(BANGLADESH) ,aitlbd,APAC,BANGLADESH
CISCO,Eastern Technologies(BANGLADESH) ,easterntec,APAC,BANGLADESH
CISCO,BRACNet Limited(BANGLADESH) ,bracnet,APAC,BANGLADESH
CISCO,Ranks ITT Ltd(BANGLADESH) ,ranksitt,APAC,BANGLADESH
CISCO,Businessland Limited(BANGLADESH) ,businesslandbd,APAC,BANGLADESH
CISCO,AGNI SYSTEMS LIMITED(BANGLADESH) ,agni,APAC,BANGLADESH
CISCO,EMEM SYSTEMS LIMITED(BANGLADESH) ,ememsystems,APAC,BANGLADESH
CISCO,Biometrics.BD(BANGLADESH) ,biometrics-bd,APAC,BANGLADESH
CISCO,Speed Technology & Engineering Ltd(BANGLADESH) ,speedtechbd,APAC,BANGLADESH
CISCO,Smarttel Communications(BANGLADESH) ,smarttel,APAC,BANGLADESH
CISCO,RM Systems Ltd.(BANGLADESH) ,rmsystemsltd,APAC,BANGLADESH
CISCO,METEOR TECHNOLOGIES(BANGLADESH) ,meteortechbd,APAC,BANGLADESH
CISCO,PERISCOPE TECHNOLOGY SOLUTIONS LTD(BANGLADESH) ,periscope-inc,APAC,BANGLADESH
CISCO,IT Connect Ltd.(BANGLADESH) ,itconnectbd,APAC,BANGLADESH
CISCO,NetLink Ltd(BANGLADESH) ,bbs,APAC,BANGLADESH
CISCO,PROBIL NEPAL(BANGLADESH) ,netas,APAC,BANGLADESH
CISCO,Peerless Corporation Limited(BANGLADESH) ,peerlesscorp,APAC,BANGLADESH
CISCO,SiS Inflexionpoint (BD) Ltd.(BANGLADESH) ,inflexionpointtech,APAC,BANGLADESH
CISCO,Netware Systems Ltd.(BANGLADESH) ,netware-bd,APAC,BANGLADESH
CISCO,Bangla Phone Limited(BANGLADESH) ,banglaphone,APAC,BANGLADESH
ion Service Inc.,datasitter,APAC,TAIWAN
Citrix,Digicentre,digicentre,APAC,TAIWAN
Citrix,Dinghan International Corp.,dinghang,APAC,TAIWAN
Citrix,Dynasafe Technologies Inc.,dynasafe,APAC,TAIWAN
Citrix,"eRaySecure Co., Ltd.",8066,APAC,TAIWAN
Citrix,"Great Target Technology Co., Ltd.",gtarget,APAC,TAIWAN
Citrix,Infinity InfoTec Co. Ltd.,innitec,APAC,TAIWAN
Citrix,InfoFab Inc.,infofab,APAC,TAIWAN
Citrix,Information Security Service Digital United,issdu,APAC,TAIWAN
Citrix,Interactive Digital Technologies Inc.,idtech,APAC,TAIWAN
Citrix,Kinmax Technology Ltd,kx,APAC,TAIWAN
Citrix,"Leagen Information Technology, Inc.",leagen,APAC,TAIWAN
Citrix,MIRLE AUTOMATION CORPORATION,mirle,APAC,TAIWAN
Citrix,Netchain Company Limited,net-chain,APAC,TAIWAN
Citrix,NewInfo System Consultant,newinfo,APAC,TAIWAN
Citrix,"Odysseus Digital Co., Ltd.",ody,APAC,TAIWAN
Citrix,Pilot Information Co Ltd,tysys,APAC,TAIWAN
Citrix,Ring Line Corporation,ringline,APAC,TAIWAN
Citrix,Scientific Formosa Inc,sciformosa,APAC,TAIWAN
Citrix,To Cloud Technology,tocloud,APAC,TAIWAN
Citrix,"Xin Chio GLobal Co., Ltd",xinchio,APAC,TAIWAN
Citrix,Sysage Technology Company Ltd.,sysage,APAC,TAIWAN
Citrix,"Dimension Data Philippines, Inc.",dimensiondata,APAC,PHILIPPINES
Citrix,Dimension Data India Limited,dimensiondata,APAC,INDIA
Citrix,Genpact India,genpact,APAC,INDIA
Citrix,IBM India Private Limited,ibm,APAC,INDIA
Citrix,Tata Consultancy Services Ltd,tcs,APAC,INDIA
Citrix,Embee Software Pvt. Ltd.,Embee Software Pvt. Ltd.,APAC,INDIA
Citrix,Emicro Data Technologies Pvt. Ltd,emicrotechnologies,APAC,INDIA
Citrix,Frontier Business Systems Pvt Ltd,frontier,APAC,INDIA
Citrix,Locuz Enterprise Solutions Limited,locuz,APAC,INDIA
Citrix,Magnamious Systems Pvt Ltd,magnamious,APAC,INDIA
Citrix,Orient Technologies Pvt. Ltd.,orientindia,APAC,INDIA
Citrix,PC Solutions Pvt Ltd,e-pspl,APAC,INDIA
Citrix,Ricoh India Limited,ricoh,APAC,INDIA
Citrix,S.K. International,skinternational,APAC,INDIA
Citrix,Veeras Infotek Pvt Ltd,veeras,APAC,INDIA
Citrix,(n)Code Solution - A division of GNFC Ltd.,ncode,APAC,INDIA
Citrix,Accel Frontline Ltd,accelicim,APAC,INDIA
Citrix,Central Data Systems Pvt Ltd,cdspl,APAC,INDIA
Citrix,D M Systems Pvt. Ltd.,dmsystems,APAC,INDIA
Citrix,Dotcad Pvt. Ltd.,dotcad,APAC,INDIA
Citrix,FIS Payment Solutions and Services,fisglobal,APAC,INDIA
Citrix,Futurenet Technologies (India) Pvt.Ltd.,futurenet,APAC,INDIA
Citrix,Futuresoft Solutions Pvt Ltd,Futuresoft Solutions Pvt Ltd,APAC,INDIA
Citrix,Gemini Communication Ltd,gcl,APAC,INDIA
Citrix,GT Enterprises,gte-india,APAC,INDIA
Citrix,Happiest Minds Technologies Pvt. Ltd,happiestminds,APAC,INDIA
Citrix,Hitachi Systems Micro Clinic Pvt. Ltd.,hitachi-systems-mc,APAC,INDIA
Citrix,IndiQus Technologies Pvt. Ltd.,indiqus,APAC,INDIA
Citrix,INSIGHT Business Machines Pvt. Ltd.,insightindia,APAC,INDIA
Citrix,Lauren Information Technologies Pvt Ltd,lauren,APAC,INDIA
Citrix,Microland Ltd,microland,APAC,INDIA
Citrix,Netcom Infotech Pvt Ltd,netcominfotech,APAC,INDIA
Citrix,Network Techlab (I) Pvt. Ltd.,netlabindia,APAC,INDIA
Citrix,New Era Informatique Pvt. Ltd.,newera-technologies,APAC,INDIA
Citrix,NewWave Computing (P) Limited,newwavecomputing,APAC,INDIA
Citrix,Pace Business Machines Pvt. Ltd.,pbmpl,APAC,INDIA
Citrix,Quadrasystems.net (India) Pvt Ltd,quadrasystems,APAC,INDIA
Citrix,QUANTM Ltd.,quantm,APAC,INDIA
Citrix,Sify Technologies Limited,sifycorp,APAC,INDIA
Citrix,Silica Infotech Private Limited,silicainfotech,APAC,INDIA
Citrix,Softcell Technologies Limited,softcell,APAC,INDIA
Citrix,Softline Services India Pvt. Ltd.,softlinegroup,APAC,INDIA
Citrix,SOLUTIONS CONSULTANCY SERVICES,solutions-india,APAC,INDIA
Citrix,Sunfire Technologies Pvt.Ltd.,sunfire,APAC,INDIA
Citrix,Information Communication Technology,ict,EMEA,QATAR
我想按如下方式过滤:
根据 GeoRegion ,我想将 SourceCompetitor 分组。
根据 SourceCompititer 以这种方式我想分组 GeoCountry
根据 GeoCountry 相同我想将具有公司名称的名称分组。
我想创建这个JSON数据结构。
{
"name": "Analytics",
"children": [{
"name": "LMAT",
"children": [{
"name": "CISCO",
"children": [{
"name": "BANGLADESH""children": [{
"name": "Techvibes Limited(BANGLADESH)"
},
{
"name": "ATSBD(BANGLADESH)"
},
{
"name": "BDCOM ONLINE LTD.(BANGLADESH)"
},
{
"name": "Itech Unlimited(BANGLADESH)"
}]
},
{
"name": "INDIA""children": [{
"name": "MASS INFONET (P) LTD(INDIA)"
},
{
"name": "Smart Systems(INDIA)"
},
{
"name": "Ashbal Computers(INDIA)"
},
{
"name": "N C S COMPUTECH PRIVATE LTD(INDIA)"
}]
}]
},
{
"name": "IBM",
"children": [{
"name": "TAIWAN""children": [{
"name": "Tatung Co."
},
{
"name": "Asgard System, Inc."
},
{
"name": "JETWELL COMPUTER CO., LTD."
},
{
"name": "Dimerco Data System Corporation"
}]
},
{
"name": "CHINA""children": [{
"name": "Taiyuan JinGuanTongLi Computer Co., Ltd."
},
{
"name": "jiangsu tonbu information technology Ltd. Co"
},
{
"name": "ShenZhen ZhenBangDa Technology Co., Ltd."
},
{
"name": "Guangzhou Kanghui Electronic Technology Co., Ltd."
}]
}]
}]
},
{
"name": "APAC",
"children": [{
"name": "MICROSOFT",
"children": [{
"name": "Karnatak""children": [{
"name": "Techvibes Limited(BANGLADESH)"
},
{
"name": "ATSBD(BANGLADESH)"
},
{
"name": "BDCOM ONLINE LTD.(BANGLADESH)"
},
{
"name": "Itech Unlimited(BANGLADESH)"
}]
},
{
"name": "Shri Lanka""children": [{
"name": "MASS INFONET (P) LTD(INDIA)"
},
{
"name": "Smart Systems(INDIA)"
},
{
"name": "Ashbal Computers(INDIA)"
},
{
"name": "N C S COMPUTECH PRIVATE LTD(INDIA)"
}]
}]
},
{
"name": "Google",
"children": [{
"name": "Switzerland""children": [{
"name": "Tatung Co."
},
{
"name": "Asgard System, Inc."
},
{
"name": "JETWELL COMPUTER CO., LTD."
},
{
"name": "Dimerco Data System Corporation"
}]
},
{
"name": "INDIA""children": [{
"name": "Taiyuan JinGuanTongLi Computer Co., Ltd."
},
{
"name": "jiangsu tonbu information technology Ltd. Co"
},
{
"name": "ShenZhen ZhenBangDa Technology Co., Ltd."
},
{
"name": "Guangzhou Kanghui Electronic Technology Co., Ltd."
}]
}]
}]
}]
}
我想根据CSV数据实现这种类型的JSON树数据结构。这是我的python代码,但我只做分组,其中我使用单字典;我不知道如何实现树结构。
import csv, json
from collections import defaultdict
class graph():
def __init__(self, filename):
self.filename = filename
jsonFile = open("graph.json", "w")
with open(self.filename, 'rb') as f:
cReader = csv.reader(f, delimiter =",")
cReader.next()
dict = {}
for cr in cReader:
if cr[3]=="":
dict.setdefault("NA", set())
dict["NA"].add(cr[0])
else:
dict.setdefault(cr[3], set())
dict[cr[3]].add(cr[0])
dict.setdefault(cr[0], set())
dict[cr[0]].add(cr[4])
dict.setdefault(cr[4], set())
dict[cr[4]].add(cr[1])
print dict
if __name__=="__main__":
filename = "graph.csv"
graph(filename)
请为此提供优化的解决方案。我没有使用任何框架。
答案 0 :(得分:1)
一个非常直接的解决方案是根据键对行进行排序,然后使用递归函数和itertools.groupby()
对树级进行分组:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import csv
import json
from itertools import groupby
from operator import itemgetter
def build_tree(rows, keys, name='Analytics'):
result = {'name': name}
if keys:
result['children'] = list(
build_tree(sub_rows, keys[1:], key_value)
for key_value, sub_rows in groupby(rows, itemgetter(keys[0]))
)
return result
def main():
with open('test.csv') as lines:
keys = ['GeoRegion', 'SourceCompetitor', 'GeoCountry', 'Name']
rows = sorted(csv.DictReader(lines), key=itemgetter(*keys))
tree = build_tree(rows, keys)
print(json.dumps(tree, indent=2))
if __name__ == '__main__':
main()