如何从Excel文件中的超链接中检索数据?

时间:2019-06-17 14:05:58

标签: python excel

我在本地Excel文件中存储了一些超链接。所有这些都在同一列中。例如

| A  
| ----------------------------------| 
| http://vocab.getty.edu/tgn/8699749|
| http://vocab.getty.edu/tgn/8704811|
| http://vocab.getty.edu/tgn/8702341|
| http://vocab.getty.edu/tgn/1063874|
| http://vocab.getty.edu/tgn/1063880|
| http://vocab.getty.edu/tgn/7032551|
|-----------------------------------|

每个链接都指向一个页面,我将从中提取与字段xl:prefLabel相关的信息并将结果存储在B列中

Openpyxl可能是解决方案?

预期结果应该类似于

| A                                 | B                      |
| ----------------------------------| ------------------------
| http://vocab.getty.edu/tgn/8699749| tgn_term:1005671253-fr |
| http://vocab.getty.edu/tgn/8704811| tgn_term:1005683546-de | 
| http://vocab.getty.edu/tgn/8702341| tgn_term:1005684314    |
| http://vocab.getty.edu/tgn/1063874| tgn_term:64447         |
| http://vocab.getty.edu/tgn/1063880| tgn_term:64453         |
| http://vocab.getty.edu/tgn/7032551| tgn_term:1001213640    |
|-----------------------------------|------------------------|

1 个答案:

答案 0 :(得分:0)

一种快速的解决方案是使用Pandas切片:

import pandas as pd
import urllib.request

all_hyperlinks = pd.read_excel(path_to_excel_file, index_col=None, header=None)
first_hl = all_hyperlinks.loc[0, 0] # Get the first hype link
contents = request.urlopen(first_hl).read()