我使用HTTP POST请求通过解析从网站获取信息,然后手动创建了列标题名称。主要问题是我无法将值填充到列中
import requests
from bs4 import BeautifulSoup
import pandas as pd
import xlsxwriter
request_headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,'
'application/signed-exchange;v=b3',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en,en-GB;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Length': '39',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'JSESSIONID=CB1FBCAD1E0A6B89CBBA959FF001B7FC; ETAXES=etaxes; JSESSIONID=0CB3B2A0B873D9308A0B5D2C8CB88EA4',
'Host': 'www.e-taxes.gov.az',
'Origin': 'https://www.e-taxes.gov.az',
'Referer': 'https://www.e-taxes.gov.az/ebyn/commersialChecker.jsp',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/75.0.3770.142 Safari/537.36',
}
voens = [1302932691,
1701579951,
]
form_data = {
'name': voens,
'tip': '2',
'submit': 'Yoxla',
}
url = 'https://www.e-taxes.gov.az/ebyn/commersialChecker.jsp'
response = requests.post(url, data=form_data, headers=request_headers)
s = BeautifulSoup(response.content, 'lxml')
sHeader = s.findAll('table', {'class': 'com'})[0].findAll('tr', recursive=False)[1]
headers = [sHeader.get_text().strip()]
print(headers)
for voen in voens:
form_data['name'] = voen
response = requests.post(url, data=form_data, headers=request_headers)
s = BeautifulSoup(response.content, 'lxml')
sContent = s.find('tr', {'class': 'style1'})
outcome = [sContent.get_text().strip()]
print(outcome)
print(type(outcome))
pandaHeaders = pd.DataFrame({'Kommersiya qurumunun adı': headers[0],
'VÖEN': headers[0],
'Vergi uçotu orqanının adı': headers[0],
'Təşkilati-hüquqi forması': headers[0],
'Hüquqi ünvanı': headers[0],
'Nizamnamə kapitalı': headers[0],
'Qanuni təmsilçi': headers[0],
'Dövlət qeydiyyatına alındığı tarix': headers[0],
'Reyestr məlumatlarına son dəyişiklik tarixi': headers[0],
})
writer = pd.ExcelWriter("informationTaxIDs.xlsx", engine="xlsxwriter")
pandaHeaders.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
header_format = workbook.add_format({
'bold': True,
'text_wrap': False,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1})
for col_num, value in enumerate(pandaHeaders.columns.values):
worksheet.write(0, col_num + 1, value, header_format)
writer.save()
我希望excel文件包含变量output
的输出,但是它会返回错误消息'VÖEN': outcome[1], IndexError: list index out of range
,就像这样。
我认为问题在于变量output
的输出是一个具有一个索引但由\n
分割的列表,因此这意味着存在一个索引。如何将输出列表拆分为索引?然后根据这些索引,excel文件头获取每个变量