Question

一个。我的目标：使用Python从Excel电子表格中提取唯一的OCPO ID，并使用这些ID对Web卷进行相应的公司名称和NIN ID。（注意：NIN和OCPO ID对于一家公司来说都是唯一的。）

B中。细节：一世。使用openpyxl从Excel电子表格中提取OCPO ID。 II。在业务注册表（https://focus.kontur.ru/）中逐个搜索OCPO ID，并使用BeautifulSoup4查找相应的公司名称和公司ID（NIN）。

示例：搜索OCPO ID“00044428”会产生匹配的公司名称ПАО“НК”РОСНЕФТЬ“和相应的NIN ID”7706107510。“

在Excel中保存公司名称和NIN ID列表。

℃。我的进步：一世。我能够将OCPO ID列表从Excel提取到Python。

# Pull the Packages
import openpyxl
import requests
import sys
from bs4 import BeautifulSoup

# Pull OCPO from the Spreadsheet
wb = openpyxl.load_workbook(r"C:\Users\ksong\Desktop\book1.xlsx")
sheet = wb.active
sheet.columns[0]
for cellobjc in sheet.columns[0]:
    print(cellobjc.value)

II。我能够搜索OCPO ID并让Python抓取匹配的公司名称和相应的公司NIN ID。

# Part 1a: Pull the Website 
r = requests.get("https://focus.kontur.ru/search?query=" + "00044428")
r.encoding = "UTF-8"

# Part 1b: Pull the Content
c = r.content
soup = BeautifulSoup(c, "html.parser", from_encoding="UTF-8")

# Part 2a: Pull Company name
name = soup.find("a", attrs={'class':"js-subject-link"})
name_box = name.text.strip()
print(name_box)

d。帮助

我。你如何编码，以便循环每个OCPO ID作为一个循环单独搜索，这样我没有得到OCPO ID列表，而是一个搜索结果列表？换句话说，搜索每个OCPO并与相应的公司名称和NIN ID匹配。该循环必须以########（“https://focus.kontur.ru/search?query=”+“########”）的形式输入。

II。另外，我应该使用什么代码将Python保存在Excel电子表格中的所有搜索结果？

Answer 1

1）创建一个空白工作簿以写入：

wb2 = Workbook()
ws1 = wb2.active

2）将第二个框中的所有代码放入第一个框中的for循环中。

3）将“00044428”改为str（cellobjc.value）

4）在每个循环结束时，将行追加到新工作表中：

row = [cellobjc.value, date_box, other_variables]
ws1.append(row)

5）循环结束后，保存文件

wb2.save("results.xlsx")

使用Python从Webscraping导出电子表格中的值（BeautifulSoup4）

1 个答案: