在python中的json文件中存储包含子列表的字符串列表

时间:2013-08-27 07:08:22

标签: python json python-2.7

我正在使用python,我有这样的数据:

RedHat Enterprise Linux ES 2.1 IA64
RedHat Enterprise Linux ES 2.1
Red Hat Enterprise Linux AS 2.1
Linux kernel 2.6.9 
Linux kernel 2.6.8 rc3
Linux kernel 2.6.8 rc1
    + Ubuntu Ubuntu Linux 4.1 ppc
    + Ubuntu Ubuntu Linux 4.1 ia64
Linux kernel 2.6.8 

我想将这些信息存储在json文件中。但我不知道怎么做! 这就像我有一个RedHats和Linux的列表,Ubuntus是我的列表中的Linux内核2.6.8 rc1的子列表,如下列表所示:

{"RedHat Enterprise Linux ES 2.1 IA64":{} ,"RedHat Enterprise Linux ES 2.1":{} ,"Red Hat Enterprise":{"Linux AS 2.1","Linux kernel 2.6.9","Linux kernel 2.6.8 rc3","Linux kernel 2.6.8 rc1"},"Linux kernel 2.6.8":{}}

这是我的整个字符串:

'RedHat Enterprise Linux WS  2.1 IA64RedHat Enterprise Linux WS  2.1RedHat Enterprise Linux ES  2.1 IA64RedHat Enterprise Linux ES  2.1Red Hat Enterprise Linux AS  2.1 IA64Red Hat Enterprise Linux AS  2.1Linux kernel 2.6.9 Linux kernel 2.6.8 rc3Linux kernel 2.6.8 rc2Linux kernel 2.6.8 rc1+ Ubuntu Ubuntu Linux 4.1 ppc+ Ubuntu Ubuntu Linux 4.1 ia64+ Ubuntu Ubuntu Linux 4.1 ia32Linux kernel 2.6.8 Linux kernel 2.6.7 rc1Linux kernel 2.6.7 Linux kernel 2.6.6 rc1Linux kernel 2.6.6 Linux kernel 2.6.5 Linux kernel 2.6.4 Linux kernel 2.6.3 Linux kernel 2.6.2 Linux kernel 2.6.1 -rc2Linux kernel 2.6.1 -rc1Linux kernel 2.6.1 Linux kernel 2.6 .10Linux kernel 2.6 -test9-CVSLinux kernel 2.6 -test9Linux kernel 2.6 -test8Linux kernel 2.6 -test7Linux kernel 2.6 -test6Linux kernel 2.6 -test5Linux kernel 2.6 -test4Linux kernel 2.6 -test3Linux kernel 2.6 -test2Linux kernel 2.6 -test11Linux kernel 2.6 -test10Linux kernel 2.6 -test1Linux kernel 2.6 Linux kernel 2.4.28 + Trustix Secure Enterprise Linux 2.0 + Trustix Secure Linux 2.2 + Trustix Secure Linux 2.1 + Trustix Secure Linux 2.0 Linux kernel 2.4.27 -pre5Linux kernel 2.4.27 -pre4Linux kernel 2.4.27 -pre3Linux kernel 2.4.27 -pre2Linux kernel 2.4.27 -pre1Linux kernel 2.4.27 Linux kernel 2.4.26 Linux kernel 2.4.25 Linux kernel 2.4.24 -ow1Linux kernel 2.4.24 Linux kernel 2.4.23 -pre9Linux kernel 2.4.23 -ow2Linux kernel 2.4.23 + Trustix Secure Linux 2.0 Linux kernel 2.4.22 + Devil-Linux Devil-Linux 1.0.5 + Devil-Linux Devil-Linux 1.0.4 + Mandriva Linux Mandrake 9.2  amd64+ Mandriva Linux Mandrake 9.2 + Red Hat Fedora  Core1+ Slackware Linux 9.1 Linux kernel 2.4.21 pre7Linux kernel 2.4.21 pre4Linux kernel 2.4.21 pre1Linux kernel 2.4.21 + Conectiva Linux 9.0 + Mandriva Linux Mandrake 9.1 ppc+ Mandriva Linux Mandrake 9.1 + Red Hat Enterprise Linux AS  3+ RedHat Desktop 3.0 + RedHat Enterprise Linux ES  3+ RedHat Enterprise Linux WS  3+ S.u.S.E. Linux Personal 9.0 x86_64+ S.u.S.E. Linux Personal 9.0 + SuSE SUSE Linux Enterprise Server  8Linux kernel 2.4.20 Linux kernel 2.4.19 -pre6Linux kernel 2.4.19 -pre5Linux kernel 2.4.19 -pre4Linux kernel 2.4.19 -pre3Linux kernel 2.4.19 -pre2Linux kernel 2.4.19 -pre1Linux kernel 2.4.19 + Conectiva Linux 8.0 + Conectiva Linux Enterprise Edition 1.0 + MandrakeSoft Corporate Server 2.1  x86_64+ MandrakeSoft Corporate Server 2.1 + MandrakeSoft Multi Network Firewall 2.0 + Mandriva Linux Mandrake 9.0 + S.u.S.E. Linux 8.1 + Slackware Linux  -current+ SuSE SUSE Linux Enterprise Server  8+ SuSE SUSE Linux Enterprise Server  7Linux kernel 2.4.18 pre-8Linux kernel 2.4.18 pre-7Linux kernel 2.4.18 pre-6Linux kernel 2.4.18 pre-5Linux kernel 2.4.18 pre-4Linux kernel 2.4.18 pre-3Linux kernel 2.4.18 pre-2Linux kernel 2.4.18 pre-1Linux kernel 2.4.18  x86Linux kernel 2.4.18 + Astaro Security Linux 2.0 23+ Astaro Security Linux 2.0 16+ Debian Linux 3.0  sparc+ Debian Linux 3.0  s/390+ Debian Linux 3.0  ppc+ Debian Linux 3.0  mipsel+ Debian Linux 3.0  mips+ Debian Linux 3.0  m68k+ Debian Linux 3.0  ia-64+ Debian Linux 3.0  ia-32+ Debian Linux 3.0  hppa+ Debian Linux 3.0  arm+ Debian Linux 3.0  alpha+ Mandriva Linux Mandrake 8.2 + Mandriva Linux Mandrake 8.1 + Mandriva Linux Mandrake 8.0 + Red Hat Enterprise Linux AS  2.1 IA64+ RedHat Advanced Workstation for the Itanium Processor 2.1 IA64+ RedHat Advanced Workstation for the Itanium Processor 2.1 + RedHat Linux 8.0 + RedHat Linux 7.3 + S.u.S.E. Linux 8.1 + S.u.S.E. Linux 8.0 + S.u.S.E. Linux 7.3 + S.u.S.E. Linux 7.2 + S.u.S.E. Linux 7.1 + S.u.S.E. Linux Connectivity Server  + S.u.S.E. Linux Database Server  0+ S.u.S.E. Linux Firewall on CD  + S.u.S.E. Linux Office Server  + S.u.S.E. Linux Openexchange Server  + S.u.S.E. Linux Personal 8.2 + S.u.S.E. SuSE eMail Server 3.1 + S.u.S.E. SuSE eMail Server III  + SuSE SUSE Linux Enterprise Server  8+ SuSE SUSE Linux Enterprise Server  7+ Turbolinux Turbolinux Server 8.0 + Turbolinux Turbolinux Server 7.0 + Turbolinux Turbolinux Workstation 8.0 + Turbolinux Turbolinux Workstation 7.0 Linux kernel 2.4.17 Linux kernel 2.4.16 Linux kernel 2.4.15 Linux kernel 2.4.14 Linux kernel 2.4.13 + Caldera OpenLinux Server 3.1.1 + Caldera OpenLinux Workstation 3.1.1 Linux kernel 2.4.12 + Conectiva Linux 7.0 Linux kernel 2.4.11 Linux kernel 2.4.10 Linux kernel 2.4.9 + Red Hat Enterprise Linux AS  2.1 IA64+ Red Hat Enterprise Linux AS  2.1+ RedHat Enterprise Linux ES  2.1 IA64+ RedHat Enterprise Linux ES  2.1+ RedHat Enterprise Linux WS  2.1 IA64+ RedHat Enterprise Linux WS  2.1+ RedHat Linux 7.2  ia64+ RedHat Linux 7.2  i386+ RedHat Linux 7.2  alpha+ RedHat Linux 7.1  ia64+ RedHat Linux 7.1  i386+ RedHat Linux 7.1  alpha+ Sun Linux 5.0.5 + Sun Linux 5.0.3 + Sun Linux 5.0 Linux kernel 2.4.8 + Mandriva Linux Mandrake 8.2 + Mandriva Linux Mandrake 8.1 + Mandriva Linux Mandrake 8.0 Linux kernel 2.4.7 + RedHat Linux 7.2 + S.u.S.E. Linux 7.2 + S.u.S.E. Linux 7.1 Linux kernel 2.4.6 Linux kernel 2.4.5 + Slackware Linux 8.0 Linux kernel 2.4.4 + S.u.S.E. Linux 7.2 Linux kernel 2.4.3 + Mandriva Linux Mandrake 8.0  ppc+ Mandriva Linux Mandrake 8.0 Linux kernel 2.4.2 Linux kernel 2.4.1 Linux kernel 2.4 .0-test9Linux kernel 2.4 .0-test8Linux kernel 2.4 .0-test7Linux kernel 2.4 .0-test6Linux kernel 2.4 .0-test5Linux kernel 2.4 .0-test4Linux kernel 2.4 .0-test3Linux kernel 2.4 .0-test2Linux kernel 2.4 .0-test12Linux kernel 2.4 .0-test11Linux kernel 2.4 .0-test10Linux kernel 2.4 .0-test1Linux kernel 2.4 Debian Linux 3.1  sparcDebian Linux 3.1  s/390Debian Linux 3.1  ppcDebian Linux 3.1  mipselDebian Linux 3.1  mipsDebian Linux 3.1  m68kDebian Linux 3.1  ia-64Debian Linux 3.1  ia-32Debian Linux 3.1  hppaDebian Linux 3.1  armDebian Linux 3.1  amd64Debian Linux 3.1  alphaDebian Linux 3.1 Debian Linux 3.0  sparcDebian Linux 3.0  s/390Debian Linux 3.0  ppcDebian Linux 3.0  mipselDebian Linux 3.0  mipsDebian Linux 3.0  m68kDebian Linux 3.0  ia-64Debian Linux 3.0  ia-32Debian Linux 3.0  hppaDebian Linux 3.0  armDebian Linux 3.0  alphaDebian Linux 3.0'

我应该解析这个和我所在的地方+是一个子字符串。

1 个答案:

答案 0 :(得分:1)

我解决了您尝试解决的问题,并通过示例SecurityFocus Bid文章(在本例中为securityfocus.com/bid/20959)。这里的想法是使用像 BeautifulSoup 这样的刮刀从网页中提取文本。然后可以解析此文本以将信息转换为JSON对象,然后可以将其转储到文件中。 SecurityFocus上TexInfo文件中的信息包含单个标记内的所有易受攻击的操作系统列表。操作系统风格的相关内核(例如SuSE Linux 8.0)出现在它下面,并且前面带有 + 符号(例如+ Linux Kernel 2.4.5)。 + 符号实际上不是简单的 + 符号,但类似于 \ n \ t \ t \ t \ t \ t \ tt \ t \ t \ t + t + 。这使得必须在将字符串转换为JSON之前对其进行处理。以下代码段为网址securityfocus.com/bid/20959执行此任务。

from bs4 import BeautifulSoup
import urllib2
import json

response = urllib2.urlopen(r'http://www.securityfocus.com/bid/20959')
html = response.read()
soup = BeautifulSoup(html)
div_element = soup.find(id="vulnerability")
tr_element = div_element.find_all(valign="top")

td_elements =  tr_element[1].find_all("td")

os_names_list = []
for os_name in td_elements[1].stripped_strings:
    os_names_list.append(os_name)

related_kernel_indices = []
[related_kernel_indices.append(i) for i in range(0,len(os_names_list)) if os_names_list[i].startswith('+')]
for i in range(0,len(related_kernel_indices)):
    os_names_list[related_kernel_indices[i]] = os_names_list[related_kernel_indices[i] - i - 1] + '-' + " ".join(os_names_list[related_kernel_indices[i]].split()[1:])


#loop through the modified list and create a dictionary of OS names along with the correspoding kernel relations
vulnerability_os_mapping = {}

for os_name_entry in os_names_list:
    related_kernels = []
    os_name_components = os_name_entry.split('-')
    if not vulnerability_os_mapping.has_key(os_name_components[0]):
        vulnerability_os_mapping[os_name_components[0]] = related_kernels
    elif len(os_name_components) > 1:
        vulnerability_os_mapping[os_name_components[0]].append(os_name_components[1])

#create a file with a template name - vulnerability_list_<bid_id>.json
vulnerability_list_file = open('vulnerability_list_20959.json','w')
json.dump(vulnerability_os_mapping, vulnerability_list_file)

我希望这可以让您了解如何执行任务等。