我是Django的初学者。我正在做一个数据抓取项目,我已经编写了这段代码,但是在下载CSV文件时遇到了问题。 我在文件中使用了功能“下载”,但没有得到想要的结果。相反,我收到此错误
Invalid URL '': No schema supplied. Perhaps you meant http://?
这是我的代码
views.py
def index(request):
if request.method == "POST":
url = request.POST.get('url', '')
down = request.POST.get('download','')
r = requests.get(url)
soup = BeautifulSoup(r.content, features="lxml")
p_name = soup.find_all("h2",attrs={"class": "a-size-mini"})
p_price = soup.find_all("span",attrs={"class": "a-price-whole"})
p_image = soup.findAll('img', {'class':'s-image','src':re.compile('.jpg')})
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="product_file.csv"'
for name,price,image in zip(p_name,p_price,p_image):
writer = csv.writer(response)
row = writer.writerow([name.text, price.text,image['src']])
name_data = [data.text for data in p_name]
price_data = [data.text for data in p_price]
image_data = [data['src'] for data in p_image]
dec = {'name':name_data, 'price':price_data, 'image':image_data}
if down:
return response
else:
dec = {}
return render(request, 'index.html',dec)
当我删除此“ if down:”时,它将正确下载我的csv文件;当我保留if条件时,它将引发错误:
Invalid URL '': No schema supplied. Perhaps you meant http://?
index.html
<div class="container">
<div class="row justify-content-md-center">
<div class="col-md-4">
<form method="POST" action="">{% csrf_token %}
<h1 class="mb-3 display-4">Amazone Scraper</h1>
<input type="text" id="url" name="url" class="form-control" placeholder="URL" required autofocus>
<button class="mt-3 btn btn-lg btn-primary btn-block" type="submit" id="submit" name='submit'>Scrap</button>
</form>
<p class="mt-3"><a href="upload">Upload</a> Your File For Updates Regarding</p>
<form action="" method="post">{% csrf_token %}<!--------download---------->
<input class="mt-3 btn btn-info" type="submit" id="download" name='download' value='Download'/>
</form>
</div>
</div>
<div class="row">
答案 0 :(得分:2)
问题是您有两种形式,当您单击download
按钮时,它将从不包含url
字段的第二种形式发送数据。因此,url
值在您的视图中为空。您应该重构此视图以仅使用一种形式。
或者您可以尝试将url
字段添加到第二种形式,并使用第一个拳头中的url
作为默认值:
<div class="container">
<div class="row justify-content-md-center">
<div class="col-md-4">
<form method="POST" action="">{% csrf_token %}
<h1 class="mb-3 display-4">Amazone Scraper</h1>
<input type="text" id="url" name="url" class="form-control" placeholder="URL" required autofocus>
<button class="mt-3 btn btn-lg btn-primary btn-block" type="submit" id="submit" name='submit'>Scrap</button>
</form>
<p class="mt-3"><a href="upload">Upload</a> Your File For Updates Regarding</p>
<form action="" method="post">{% csrf_token %}<!--------download---------->
您还需要向模板上下文中添加url
:
for name,price,image in zip(p_name,p_price,p_image):
writer = csv.writer(response)
row = writer.writerow([name.text, price.text,image['src']])
name_data = [data.text for data in p_name]
price_data = [data.text for data in p_price]
image_data = [data['src'] for data in p_image]
dec = {'name':name_data, 'price':price_data, 'image':image_data, 'url': url}
请注意,以下架构请求将向第三方URL发送两次。因此,我想您应该重构视图并仅使用一种形式来代替“剪贴并下载”。