Question

我有一个.txt文件，在每个要迭代的行上都有不同的链接，然后解析为BeautifulSoup（ response.text ，“ html.parser”）。我有几个问题。

我可以看到从文本文件开始迭代的行，但是当我将它们分配给我的request.get（websitelink）时，以前有效的代码（无迭代）不再打印我抓取的任何数据。

我收到的只是结果中的空白行。

我是Python和BeautifulSoup的新手，所以我不确定自己做错了什么。我尝试将行解析为字符串，但这似乎没有用。

import requests
from bs4 import BeautifulSoup
filename = 'item_ids.txt'

with open(filename, "r") as fp:
    lines = fp.readlines()
    for line in lines:

        #Test to see if iteration for line to line works
        print(line)

        #Assign single line to websitelink
        websitelink = line

        #Parse websitelink into requests
        response = requests.get(websitelink)
        soup = BeautifulSoup(response.text, "html.parser")

        #initialize and reset vars for cd loop
        count = 0
        weapon = ''
        stats = ''

        #iterate through cdata on page, and parse wanted data
        for cd in soup.findAll(text=True):
            if isinstance(cd, CData):
                #print(cd)
                count += 1
                if count == 1:
                    weapon = cd
                if count == 6:
                    stats = cd

        #concatenate cdata info
        both = weapon + " " + stats
        print(both)

代码应遵循以下步骤：

从文本文件中读取行（URL），并分配给要通过request.get（websitelink）使用的变量
BeautifulSoup会抓取指向CData的链接并进行打印
重复步骤1和2，直到文本文件的最后一行（最后一个URL）

任何帮助将不胜感激，

谢谢

Answer 1

我不知道这对您有没有帮助，但是当您将strip()变量分配给link时，我已经向您的websitelink变量中添加了websitelink = line.strip()并帮助我您的代码工作。您可以尝试。

将网站URL从文本文件迭代到带有Python的BeautifulSoup中

代码应遵循以下步骤：

1 个答案: