如何获取我被重定向到的链接

时间:2018-05-09 05:17:00

标签: python python-3.x url-redirection

我的程序中有一个配置文件的程序,您可以添加公司股票代码然后它会获取该配置文件中的那些股票代码并搜索新闻文章,这些信息都是从API中提取的。我打印出来的类别之一是URL字段,因此我打印出来的URL是一个将您重定向到另一个URL(真实URL)的URL。现在我要做的是获取重定向的URL以打印出来

我有一个全球公司列表,我附上了我提取的所有网址,因此所有通用重定向网址都在那里。我正在获取重定向的URL,唯一的问题是我只获得1个重定向的URL,而且它正在为我正在打印的每个新闻文章打印该URL。这有点难以解释,所以如果你需要进一步澄清,请问。

这是我的代码,我所评论的是我正在尝试的内容。

出于测试目的,这里有2个库存符号,您可以将其添加到配置文件中。如果你想测试一些东西:aapl和yelp,只需将它们放在配置文件中的单独行中即可。

import sys
import json
import urllib.request
import time
import datetime
import requests

def main():
    openconfigfile()
    searchfornews()

def openconfigfile():
    mylist = []
    with open('config.txt') as myfile:
        for company in myfile:
            mylist.append(company.strip())
    return mylist

companyurl = []
def searchfornews():
    myurl = []
    global companyurl
    url = 'https://api.iextrading.com/1.0/stock/'
    companies = openconfigfile()
    for company in companies:
        stockinput = company + '/news/last/2'
        createdurl = url + stockinput
        myurl.append(createdurl)
    while True:
        try:
            for url in myurl:
                fob = urllib.request.urlopen(url)
                data = fob.read().decode('utf-8')
                companydata = json.loads(data)
                for company in companydata:
                    company['datetime'] = reformatdate()
                    companyurl.append(company['url'])
                    # r = getredirectedlink()
                    # company['url'] = r.url

                    print('''======== [%s] ========
%s:   "%s"
%s
tags: %s''' % (company['datetime'], company['source'], company['headline'], company['url'], company['related']))

            time.sleep(30)
        except Exception as e:
            print()
            print('''ERROR: news not found for 1 or more stock symbols
You have a stock symbol in the config file that doesnt match any known stock symbol''', e)
            time.sleep(30)

def reformatdate():
    time = datetime.datetime.today()
    newtime = time.strftime('%B %d %Y, %I:%M %p')
    return newtime

# def getredirectedlink():
#     global companyurl
#     for x in companyurl:
#         r = requests.get(x)
#         return r

if __name__ == '__main__':
    sys.exit(main())

1 个答案:

答案 0 :(得分:1)

你差不多完成了。你只需改变两件事:

  1. searchfornews内:

    company['datetime'] = reformatdate()
    companyurl.append(company['url'])
    # r = getredirectedlink()
    # company['url'] = r.url
    

    更改为

    company['datetime'] = reformatdate()
    company['url'] = getredirectedlink(company['url'])
    companyurl.append(company['url'])
    

    并将getredirectedlink更改为以下内容:

    def getredirectedlink(companyurl):
        r = requests.get(companyurl)
        return r.url