我根据youtube vid写了一个网络刮刀。它只给我一个容器,来自所有48个容器。
为什么我的代码不能自动循环遍历所有容器?我在这里想念的是什么?
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.tradera.com/search?itemStatus=Ended&q=iphone+6+-6s+64gb+-plus'
#
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#Container
containers = page_soup.findAll("div",{"class":"item-card-details"})
filename = "ip6.csv"
f = open(filename, "w")
headers = "title, link, price, bids\n"
f.write(headers)
for container in containers:
title = container.div.div.h3["title"]
link = container.div.div.h3.a["href"]
price_container = container.findAll("span",{"class":"item-card-details-price-amount"})
price = price_container[0].text
bid_container = container.findAll("span",{"class":"item-card-details-bids"})
bids = bid_container[0].text
print("title: " + title)
print("link: " + link)
print("price: " + price)
print("bids: " + bids)
f.write(title + "," + link + "," + price + "," + bids + "\n")
f.close
答案 0 :(得分:0)
因为循环是"空"。在python中,你必须缩进应在循环内运行的代码块,例如:
for i in loop:
# do something
在您的代码中:
for container in containers:
title = container.div.div.h3["title"]
link = container.div.div.h3.a["href"]
price_container = container.findAll("span",{"class":"item-card-details-price-amount"})
price = price_container[0].text
bid_container = container.findAll("span",{"class":"item-card-details-bids"})
bids = bid_container[0].text
print("title: " + title)
print("link: " + link)
print("price: " + price)
print("bids: " + bids)
f.write(title + "," + link + "," + price + "," + bids + "\n")
f.close
答案 1 :(得分:0)
你问我发生了什么,为什么我得到了正确的结果。下面的脚本调整为py 3.5。因为它看起来在打印线上发生了一些错误。我偶然在你的问题中修改了你的脚本。
正如Ilja所指出的那样,在我意外的部分修复之前,存在缩进错误并且正确的他提到空列表返回...我在意外修复中错过的是没有将打印语句引入for循环。所以我得到一个结果。检查网页...您想要收集所有手机产品。
下面的脚本通过在for循环中包含print-statements来修复所有问题。因此,在您的Pycharm标准输出中,您现在应该拥有许多印刷产品块。修复文件线应该在csv文件中显示类似的结果。
Py3.5 +在打印方面有点幼稚('title'+ title`)。 IMO ...样式py2.x应该保留,因为它提供了更大的灵活性并通过减少键入来降低RSI。无论如何,通过这个手机网页的迭代现在应该像pyCharm一样工作..
repr
评论:不,你根本没有使用repr
而且不需要它......但
有关打印语法示例,请检查here和官方python文档here。
此外,我还为输出文件添加了一些格式代码。它现在应该在列中......并且可读。享受!
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.tradera.com/search?itemStatus=Ended&q=iphone+6+-6s+64gb+-plus'
#
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#Container
containers = page_soup.findAll("div",{"class":"item-card-details"})
filename = "ip6.csv"
f = open(filename, "w")
headers = "title, link, price, bids\n"
f.write(headers)
l1 = 0
l2 = 0
l3 = 0
# get longest entry per item for string/column-formatting
for container in containers:
title = container.div.div.h3["title"]
t = len(title)
if t > l1:
l1 = t
link = container.div.div.h3.a["href"]
price_container = container.findAll("span",{"class":"item-card-details-price-amount"})
price = price_container[0].text
p = len(price)
if p > l2:
l2 = p
bid_container = container.findAll("span",{"class":"item-card-details-bids"})
bids = bid_container[0].text
b = len(bids)
if b > l3:
l3 = b
for container in containers:
title = container.div.div.h3["title"]
link = container.div.div.h3.a["href"]
price_container = container.findAll("span",{"class":"item-card-details-price-amount"})
price = price_container[0].text
bid_container = container.findAll("span",{"class":"item-card-details-bids"})
bids = bid_container[0].text
# claculate distances between columns
d1 = l1-len(title) + 0
d2 = l2-len(price) + 1
d3 = l3-len(bids) + 1
d4 = 2
print("title : %s-%s %s." % (l1, d1, title))
print("price : %s-%s %s." % (l2, d2, price))
print("bids : %s-%s %s." % (l3, d3, bids))
print("link : %s." % link)
f.write('%s%s, %s%s, %s%s, %s%s\n' % (title, d1* ' ', d2* ' ', price, d3 * ' ', bids, d4 * ' ', link))
f.close
答案 2 :(得分:0)
谢谢大家帮我解决这个问题。这是印刷线的缩进。你是最好的!