Question

我正在一个项目中，该项目使用python BeautifulSoup库从网页中获取一些数据。假设Quora有一个答案，我想将其本地存储在我的python变量中。答案可能包含图像和文本，那么如何将它们存储在单个变量中？

我试图获得问题名称，作者姓名等等，但是问题出在答案之内。

import requests
from bs4 import BeautifulSoup

print("\nLoading Data..")
result = requests.get("https://qr.ae/TWGJU0")

success = result.status_code
if success==200:
    print("Connection to the webpage was successful..!\n")

src = result.content

soup = BeautifulSoup(src, 'lxml')

question = soup.find("a", attrs={'class': 'question_link'})
print("Question:"+question.text)

author = soup.find("a", attrs={'class': 'user'})
print("Author:"+author.text)

profile = soup.find("a", attrs={'class': 'user'})
print("Author Profile: https://www.quora.com"+profile.attrs['href'])

print("\n")
answer = soup.find("div", attrs={'class':'u-serif-font-main--regular'})
print("Answer:"+answer.text)

输出仅打印文本数据，我知道这是因为我已经使用了“ answer.text”，但是仍然如何进行这项工作？

Answer 1

要将图像存储为变量，只需拉<img>标签即可。可能有多个图像，因此可以使用列表理解将其存储在列表中：

import requests
from bs4 import BeautifulSoup

print("\nLoading Data..")
result = requests.get("https://qr.ae/TWGJU0")

success = result.status_code
if success==200:
    print("Connection to the webpage was successful..!\n")

src = result.content

soup = BeautifulSoup(src, 'lxml')

question = soup.find("a", attrs={'class': 'question_link'})
print("Question:"+question.text)

author = soup.find("a", attrs={'class': 'user'})
print("Author:"+author.text)

profile = soup.find("a", attrs={'class': 'user'})
print("Author Profile: https://www.quora.com"+profile.attrs['href'])

print("\n")
answer = soup.find("div", attrs={'class':'u-serif-font-main--regular'})
print("Answer:"+answer.text)

print("\n")
images = [ each['src'] for each in answer.find_all('img') ]
for image in images:
    print ("Images:" + image)

现在，您的图像存储在单个变量中：

print (images)
['https://qph.fs.quoracdn.net/main-qimg-1034d14bf757fcbedc38dfdb186413d3']

或

import requests
from bs4 import BeautifulSoup

print("\nLoading Data..")
result = requests.get("https://qr.ae/TWGJU0")

success = result.status_code
if success==200:
    print("Connection to the webpage was successful..!\n")

src = result.content

soup = BeautifulSoup(src, 'lxml')

question = soup.find("a", attrs={'class': 'question_link'})
print("Question:"+question.text)

author = soup.find("a", attrs={'class': 'user'})
print("Author:"+author.text)

profile = soup.find("a", attrs={'class': 'user'})
print("Author Profile: https://www.quora.com"+profile.attrs['href'])

print("\n")
answer = soup.find("div", attrs={'class':'u-serif-font-main--regular'})


answer_images = []
for sentence in answer.find_all():
    if sentence.name == 'p':
        answer_images.append(sentence.text)
    if sentence.name == 'img':
        answer_images.append(sentence['src'])

answer = ' '.join(answer_images)

print("Answer:"+answer)

如果您希望将答案与图像一起存储在答案中，则可以遍历这些元素。但是，除非您进行更多操作（例如将其另存为html或使用cv2，matplot或其他软件包显示图像的url）将其显示为图像，否则它将不会显示图像：< / p>

另一种方式：

import requests
from bs4 import BeautifulSoup
from matplotlib import pyplot as plt

print("\nLoading Data..")
result = requests.get("https://qr.ae/TWGJU0")

success = result.status_code
if success==200:
    print("Connection to the webpage was successful..!\n")

src = result.content

soup = BeautifulSoup(src, 'lxml')

question = soup.find("a", attrs={'class': 'question_link'})
print("Question:"+question.text)

author = soup.find("a", attrs={'class': 'user'})
print("Author:"+author.text)

profile = soup.find("a", attrs={'class': 'user'})
print("Author Profile: https://www.quora.com"+profile.attrs['href'])

print("\n")
answer = soup.find("div", attrs={'class':'u-serif-font-main--regular'})


answer_images = []
for sentence in answer.find_all():
    if sentence.name == 'p':
        answer_images.append(sentence.text)
    if sentence.name == 'img':
        answer_images.append(sentence['src'])

for each in answer_images:
    if 'https://' in each:
        a = plt.imread('https://qph.fs.quoracdn.net/main-qimg-1034d14bf757fcbedc38dfdb186413d3')
        plt.axis('off')
        plt.imshow(a)
        plt.show()
    else:
        print (each + ' ')

输出：

从网站获取数据并将一些图像和文本都存储在一个变量中？

1 个答案: