我有这段代码,但不知道如何从CSV或列表中读取链接。我想阅读链接并从每个链接中删除详细信息,然后将每个链接所关注的列中的数据保存到输出CSV中。
以下是我为获取特定数据而构建的代码。
from bs4 import BeautifulSoup
import requests
url = "http://www.ebay.com/itm/282231178856"
r = requests.get(url)
x = BeautifulSoup(r.content, "html.parser")
# print(x.prettify().encode('utf-8'))
# time to find some tags!!
# y = x.find_all("tag")
z = x.find_all("h1", {"itemprop": "name"})
# print z
# for loop done to extracting the title.
for item in z:
try:
print item.text.replace('Details about ', '')
except:
pass
# category extraction done
m = x.find_all("span", {"itemprop": "name"})
# print m
for item in m:
try:
print item.text
except:
pass
# item condition extraction done
n = x.find_all("div", {"itemprop": "itemCondition"})
# print n
for item in n:
try:
print item.text
except:
pass
# sold number extraction done
k = x.find_all("span", {"class": "vi-qtyS vi-bboxrev-dsplblk vi-qty-vert-algn vi-qty-pur-lnk"})
# print k
for item in k:
try:
print item.text
except:
pass
# Watchers extraction done
u = x.find_all("span", {"class": "vi-buybox-watchcount"})
# print u
for item in u:
try:
print item.text
except:
pass
# returns details extraction done
t = x.find_all("span", {"id": "vi-ret-accrd-txt"})
# print t
for item in t:
try:
print item.text
except:
pass
#per hour day view done
a = x.find_all("div", {"class": "vi-notify-new-bg-dBtm"})
# print a
for item in a:
try:
print item.text
except:
pass
#trending at price
b = x.find_all("span", {"class": "mp-prc-red"})
#print b
for item in b:
try:
print item.text
except:
pass
答案 0 :(得分:2)
你的问题有点模糊!
你在谈论哪些链接?单个ebay页面上有一百个。你想要哪些信息?同样地,也有一吨。
但无论如何,我会继续:
# First, create a list of urls you want to iterate on
urls = []
soup = (re.text, "html.parser")
# Assuming your links of interests are values of "href" attributes within <a> tags
a_tags = soup.find_all("a")
for tag in a_tags:
urls.append(tag["href"])
# Second, start to iterate while storing the info
info_1, info_2 = [], []
for link in urls:
# Do stuff here, maybe its time to define your existing loops as functions?
info_a, info_b = YourFunctionReturningValues(soup)
info_1.append(info_a)
info_2.append(info_b)
然后,如果你想要一个不错的csv输出:
# Don't forget to import the csv module
with open(r"path_to_file.csv", "wb") as my_file:
csv_writer = csv.writer(final_csv, delimiter = ",")
csv_writer.writerows(zip(urls, info_1, info_2, info_3))
希望这会有所帮助吗?
当然,请不要犹豫,提供更多信息,以便获得更多详细信息
使用BeautifulSoup处理属性:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes