Question

我有一个Excel文件，其中包含400多家公司的列表，我想打印公司名称并通过Google.com上的网页抓取方式获取公司的每个链接。你能帮我吗

我不知道如何附加我的Excel文件。但是这是屏幕截图：

enter image description here

我想通过在Google上同时搜索来获取每个公司的链接，并创建一个包含所有链接的Excel文件。这是我在做什么。但是只有一个。

Answer 1

我不确定这是什么意思：“每个公司首页的所有链接”。由于您没有提供任何示例URL，因此我将仅通过几个通用URL进行演示，以说明这一点。

以下是Python中的一个想法。这接近您想要的东西吗？

from bs4 import BeautifulSoup
import urllib.request

for numb in ('1', '10'):
    resp = urllib.request.urlopen("https://realfood.tesco.com/search.html?DietaryOption=Vegetarian")
    soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))

    for link in soup.find_all('a', href=True):
        print(link['href'])

Result:
/recipes.html
/recipes/recipe-binder.html
/top-10s.html
/helpful-lists.html
/step-by-steps.html
/recipes/collections.html
/healthy-recipes.html
/recipes/collections/family-favourites-recipes.html
/recipes/collections/on-a-budget.html
/recipes/collections/crowd-pleasers.html
/recipes/collections.html#subcategories
/recipes/courses.html
/recipes/courses/breakfast-recipes.html
/recipes/courses/lunch-recipes.html
/recipes/courses/dinner-recipes.html
/recipes/courses/dessert-recipes.html
/recipes/courses.html#subcategories
/recipes/events.html

或者，下面是使用VBA的一个想法。这接近您想要的东西吗？

Sub HREF_Web()

Dim doc As HTMLDocument
Dim output As Object

Set IE = New InternetExplorer
IE.Visible = False
IE.navigate Range("L1")

Do
'DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

Set doc = IE.document
Set output = doc.getElementsByTagName("a")

i = 5

For Each link In output
    'If link.InnerHTML = "" Then
        Range("A" & i).Value2 = link
   ' End If
   i = i + 1
Next

MsgBox "Done!"

End Sub

Excel工作表：

Python xlsx和网页抓取

1 个答案: