尝试将数据解析为列表列表时遇到问题。 我正试图抓取有关部门及其主题的信息。 但是,由于每个部门都有不同数量的主题,我需要创建一个列表列表,以便稍后将数据链接在一起。我设法导航索引错误,问题似乎来自编译主题列表。
from lxml import html
import requests
page = requests.get('URL')
page_source_code = html.fromstring(page.text)
departments_list = []
subject_list = []
for dep in range(1,3):
departments = page_source_code.xpath('tag'
+str(dep)+']tag/text()')
### print(dep, departments)
if departments == []:
pass
else:
departments_list.append(departments[0])
for sub in range(1,20):
subjects = page_source_code.xpath('tag'
+str(dep)+']tag'
+str(sub)+']tag/text()')
### print(sub, subjects)
if subjects == []:
pass
else:
subject_list.append(subjects[0])
print('Department list ------ ', len(departments_list), departments_list, '\n')
print('Subject list ------ ', len(subject_list), subject_list)
我的输出如下:
Department list ------ 2 ['Department_1', 'Department_2']
Subject list ------ 7 ['Subject_1'(dep_1), 'Subject_2 '(dep_1), 'Subject_3 '(dep_1), 'Subject_4'(dep_1), 'Subject_5'(dep_2), 'Subject_6 '(dep_2), 'Subject_7 '(dep_2)']
此代码似乎将所有主题放入一个列表中。我希望如下:
Subject list ------ 7 [['Subject_1'(dep_1), 'Subject_2 '(dep_1), 'Subject_3 '(dep_1), 'Subject_4'(dep_1)], ['Subject_5'(dep_2), 'Subject_6 '(dep_2), 'Subject_7 '(dep_2)']]
答案 0 :(得分:0)
您需要为主题列表全局添加两个列表 并找出subject [0]字符串中的'dep_1'或'dep_2'字样。
#declare the list for subject
sub_list1 = [] sub_list2 = []
#this code is under the second for loop
if subjects.find('dep_1') == -1 :
sub_list2.append(subjects[0])
else:
sub_list1.append(subjects[0])
#Please remove the subjectList.append statement
#from second for loop
#and put it end of both loop like that .
subjectList = [sub_list1,sub_list2]