Ik有很多这样的问题,但答案都是具体的,只能修复人员特定脚本的解决方案。 我目前正在尝试从supremenewyork.com打印一堆信息 来自英国网站。这个脚本可以简洁地打印出我想要的所有信息,但是当我添加代理脚本时,我开始收到很多错误。 我知道prxy脚本是有效的,因为我在一个小scipt上测试它并且它能够提取最高级的信息并且不存在于最高级别的我们 这是我的剧本。
import requests
from bs4 import BeautifulSoup
UK_Proxy1 = raw_input('UK http Proxy1: ')
UK_Proxy2 = raw_input('UK http Proxy2: ')
proxies = {
'http': 'http://' + UK_Proxy1 + '',
'https': 'http://' + UK_Proxy2 + '',
}
categorys = ['jackets','shirts','tops_sweaters','sweatshirts','pants','shorts','t- shirts','hats','hats','bags','accessories','shoes','skate']
catNumb = 0
altArray = []
nameArray = []
styleArray = []
for cat in categorys:
catStr = str(categorys[catNumb])
cUrl = 'http://www.supremenewyork.com/shop/all/' + catStr
proxy_script = requests.get((cUrl.text), proxies=proxies)
bSoup = BeautifulSoup(proxy_script, 'lxml')
print('\n*******************"'+ catStr.upper() + '"*******************\n')
catNumb += 1
for item in bSoup.find_all('div', class_='inner-article'):
url = item.a['href']
alt = item.find('img')['alt']
req = requests.get('http://www.supremenewyork.com' + url)
item_soup = BeautifulSoup(req.text, 'lxml')
name = item_soup.find('h1', itemprop='name').text
style = item_soup.find('p', itemprop='model').text
print alt +(' --- ')+ name +(' --- ')+ style
altArray.append(alt)
nameArray.append(name)
styleArray.append(style)
print altArray
print nameArray
print styleArray
执行脚本时出现此错误
AttributeError:'str'对象没有属性'text',错误指向
proxy_script = requests.get((cUrl.text),proxies = proxies)
我最近将这个添加到了修复它的脚本中...它能够打印类别但它们之间没有信息。哪个(我需要)它只是打印****************夹克**************,****衬衫***** *等等......这是我改变的内容
import requests
from bs4 import BeautifulSoup
# make sure proxy is http and port 8080
UK_Proxy1 = raw_input('UK http Proxy1: ')
UK_Proxy2 = raw_input('UK http Proxy2: ')
proxies = {
'http': 'http://' + UK_Proxy1 + '',
'https': 'http://' + UK_Proxy2 + '',
}
categorys = ['jackets','shirts','tops_sweaters','sweatshirts','pants','shorts','t-shirts','hats','bags','accessories','shoes','skate']
catNumb = 0
altArray = []
nameArray = []
styleArray = []
for cat in categorys:
catStr = str(categorys[catNumb])
cUrl = 'http://www.supremenewyork.com/shop/all/' + catStr
proxy_script = requests.get(cUrl, proxies=proxies).text
bSoup = BeautifulSoup(proxy_script, 'lxml')
print('\n*******************"'+ catStr.upper() + '"*******************\n')
catNumb += 1
for item in bSoup.find_all('div', class_='inner-article'):
url = item.a['href']
alt = item.find('img')['alt']
req = requests.get('http://www.supremenewyork.com' + url)
item_soup = BeautifulSoup(req.text, 'lxml')
name = item_soup.find('h1', itemprop='name').text
style = item_soup.find('p', itemprop='model').text
print alt +(' --- ')+ name +(' --- ')+ style
altArray.append(alt)
nameArray.append(name)
styleArray.append(style)
print altArray print nameArray print styleArray
我把.text放在最后,它有点工作....我如何解决它所以它打印我想要的信息???
答案 0 :(得分:0)
我想你想念smt。您的cUrl是字符串类型,而不是请求类型。我想你想要: proxy_script = requests.get(cUrl,proxies = proxies).text