如何迭代excel工作簿中的行来抓取URL?

时间:2017-08-22 16:48:14

标签: python excel

我是python脚本新手并且遇到一个小问题。请帮助解决以下问题: 如何迭代行以使关键字等于第1列中的名称以及如何将输出写入同一个Excel工作表?

谢谢。

Excel表格:

Col 1

Name1

Name2

Name3

无法使旧代码生效,所以这里是新代码。

新的url_scraper.py脚本:

import requests
from bs4 import BeautifulSoup
import xlrd
import xlwt
import pandas as pd
import xlsxwriter

book = xlrd.open_workbook("test.xlsx")
sh = book.sheet_by_index(0)
aa = sh.cell_value(rowx=0, colx=0)
df5 = pd.read_excel("test.xlsx")
writer = pd.ExcelWriter('test1.xlsx', engine='xlsxwriter')
df5.to_excel(writer, sheet_name='Sheet1', index=False, startcol=0)
print (df5)
#df = pd.read_excel("test.xlsx")
df3=df5['aa'] = "http://www.example.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="+ df5.aa.astype(str)
df3.to_excel(writer, sheet_name='Sheet1', index=False, startcol=2, header=False, startrow=1)
print (df3)

book = xlrd.open_workbook("test1.xlsx")
sh = book.sheet_by_index(0)
row = 1
col = 2
aa1 = sh.cell_value(rowx=row, colx=col)
row += 1
url = aa1
response = requests.get(url)
page = str(BeautifulSoup(response.content))
start_quote = page.find("http://ecx.")
end_quote = page.find(".jpg", start_quote + 1)
url1 = page[start_quote + 0: end_quote + 4]
print (url1)
ds = pd.Series(data = url1)
df = pd.DataFrame(data = ds)
df.to_excel(writer, sheet_name='Sheet1', index=False, startcol=1, header=False, startrow=1)

根据需要输出新代码,但我无法循环。

输出新代码:

col1 col2

name1 url以name1的“http://ecx”开头

name2 url不在此处打印

name3 url不在此处打印

请帮忙。

0 个答案:

没有答案