在python列表中导入excel列

时间:2014-12-26 07:58:07

标签: python excel python-requests xlrd

您好我有一个只有1列的Excel工作表,我想将该列导入python中的列表。 它在该列中有5个元素,所有元素都包含类似“http://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0”的网址。

我的代码

import requests
import csv
import xlrd

ls = []
ls1 = ['01.jpg','02.jpg','03.jpg','04.jpg','05.jpg','06.jpg']
wb = xlrd.open_workbook('Book1.xls')
ws = wb.sheet_by_name('Book1')
num_rows = ws.nrows - 1
curr_row = -1
while (curr_row < num_rows):
    curr_row += 1
    row = ws.row(curr_row)
    ls.append(row)

for each in ls:
    urlFetch = requests.get(each)
    img = urlFetch.content
    for x in ls1:
        file = open(x,'wb') 
        file.write(img)
        file.close()

现在它给了我错误:

Traceback (most recent call last):
  File     "C:\Users\Prime\Documents\NetBeansProjects\Python_File_Retrieve\src\python_file_retrieve.py", line   18, in <module>
urlFetch = requests.get(each)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 65, in get
return request('get', url, **kwargs)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 49, in request
response = session.request(method=method, url=url, **kwargs)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 461, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 567, in send
    adapter = self.get_adapter(url=request.url)
  File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 646, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']'

请帮助

2 个答案:

答案 0 :(得分:1)

您的问题不在于阅读Excel文件,而在于解析内容。请注意,您的错误是从Requests库中抛出的?

requests.exceptions.InvalidSchema: No connection adapters were found for <url>

从错误中我们了解到您从Excel文件中的每个单元格中获取的URL,也有一个[text:前缀 -

'[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']'

这是请求无法使用的内容,因为它不知道URL的协议。 如果你这样做

requests.get('https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0')

你得到了合适的结果。

您需要做的是仅从单元格中提取URL。 如果您遇到问题,请在Excel文件中提供网址示例

答案 1 :(得分:0)

对于电子表格中的网址,请点击其中一个网址,然后查看公式栏中显示的内容。我猜它看起来像这样:

[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']

因为在堆栈跟踪中,这就是它为url打印的内容。

你可以删除括号,引号和“text:”部分内容吗?那应该解决它。