我的意图是从Google表格第四列中找到的一系列URL中抓取xpath的值,并将该值打印在URL左侧的单元格中。
到目前为止,我已经掌握了以下内容,但是运行此命令时,它将为所有URL打印adGroupStatus列表的最后一个值,而不是为每个对应的URL打印正确的值。
任何人都可以提供解决方案吗?
import requests
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from lxml import html
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
sh = client.open('example_sheet_name')
worksheet = sh.get_worksheet(0)
# the column (4th) with our URLs
url_list = worksheet.col_values(4)
# where we want our xpath values to print to
cell_list = worksheet.range('C1:C5')
def grab_xpathtext(urls, cell_range):
# do the below for each url in the spreadsheet column 4:
for url in urls:
r = requests.get(url)
tree = html.fromstring(r.content)
adGroupStatus = tree.xpath('//*[@id="m_count"]/text()')
# below prints each value to the cmd line on a new line as expected
print(adGroupStatus[0])
for cell in cell_range:
# below prints the last value instead of each corresponding value
cell.value = adGroupStatus[0]
worksheet.update_cells(cell_range)
grab_xpathtext(url_list, cell_list)
我希望输出类似于以下内容:
|位置1 |描述| 1 |网址1 |
|位置2 |描述| 2 |网址2 |
|位置3 |描述| 3 |网址3 |
|位置4 |描述| 4 |网址4 |
|位置5 |描述| 5 |网址5 |
...但是我得到了:
|位置1 |描述| 5 |网址1 |
|位置2 |描述| 5 |网址2 |
|位置3 |描述| 5 |网址3 |
|位置4 |描述| 5 |网址4 |
|位置5 |描述| 5 |网址5 |
答案 0 :(得分:0)
我在另一个问题中找到了答案: Python/gspread - how can I update multiple cells with DIFFERENT VALUES at once?
实现为:
url_list = worksheet.col_values(4)
cell_list = worksheet.range('C1:C5')
def grab_xpathtext(urls, cell_range):
statuses = []
for url in urls:
r = requests.get(url)
tree = html.fromstring(r.content)
adGroupStatus = tree.xpath('//*[@id="m_count"]/text()')
statuses.append(adGroupStatus[0])
print(statuses)
for cell in cell_range:
for i, val in enumerate(statuses):
cell_range[i].value = val
worksheet.update_cells(cell_range)
grab_xpathtext(url_list, cell_list)