Python正确替换变量中的一些内容

时间:2021-03-27 00:03:11

标签: python

我正在尝试使用 Yahoo Engine 进行抓取。使用像“python”这样的关键字。

我写了这个小程序:

query = "python"
url = {"https://fr.search.yahoo.com/search?p=&fr=yfp-search-sb",
"https://fr.search.yahoo.com/search?p=&fr=yfp-search-sb&b=11&pz=10&pstart=5"}

def checker():
    for yahoo in url:
        yahooo = yahoo.replace("&fr",query + "&fr") 
    r = requests.get(yahooo)
    soup = bs(r.text, 'html.parser')
    links =  soup.find_all('a')
    for link in soup.find_all('a'):
        a = link.get('href')
        unquote(a)
        print("Urls : " + a)
        with open("Yahoo.txt", mode="a",encoding="utf-8") as fullz:
            fullz.write(a + "\n")
            fullz.close()
        lines_seen = set() # holds lines already seen
        outfile = open("Yahoonodup.txt", "w", encoding="utf-8")
        for line in open("Yahoo.txt", "r", encoding="utf-8"):
            if line not in lines_seen: # not a duplicate                               
                outfile.write(line)
                lines_seen.add(line)
        outfile.close()
            
checker()

我的输出文件包含一些这样的网址:

https://r.search.yahoo.com/cbclk2/dWU9MURCNjczQ0UwNThBNDk4MyZ1dD0xNjE2ODAzMTA5MDE4JnVvPTg0OTM3NTA2NTgyMzY5Jmx0PTImcz0xJmVzPVdHbFZxQzRHUFNfemNveGNLaUgxVkpoX3lXV2N2WFhiQkRfZklRLS0-/RV=2/RE=1616831909/RO=10/RU=https%3a%2f%2fwww.bing.com%2faclick%3fld%3de8BWTO-5A13W9y2D2Aw39AjjVUCUyb98EJf6bSa7R7dGxGXelKfNh7KW94OonXABpN7Bo9YkZqB22Evk3cfTIpJi3aGEXXKJMtDqnaNUDUVcsehzFOYyr09GoYqUE-iUywRWeOnV4aeACKf4_YX6dE2BVZAbqkvWj4HQMqeB_Fl1KlwT1v%26u%3daHR0cHMlM2ElMmYlMmZ2ZXJnbGVpY2guZm9jdXMuZGUlMmZ3YXNjaG1hc2NoaW5lJTJmJTNmY2hhbm5lbCUzZGJpbmclMjZkZXZpY2UlM2RjJTI2bmV0d29yayUzZG8lMjZjYW1wYWlnbiUzZDQwNzE4NzU1MCUyNmFkZ3JvdXAlM2QxMzU4OTk2OTA3NDAxNDE4JTI2dGFyZ2V0JTNka3dkLTg0OTM3NjAxMjIzNjUyJTNhbG9jLTcyJTI2YWQlM2Q4NDkzNzUwNjU4MjM2OSUyNmFkLWV4dGVuc2lvbiUzZA%26rlid%3d0fc40f09a4b6109e9c726f57d193ec0e/RK=2/RS=3w4U9AT_OQyaVSF.6KLwzWuo_LU-;_ylc=cnQDMQ--?IG=0ac9439bcf3f4ec087000000005bf464

我想把它改成真正的链接:

https://vergleich.focus.de/waschmaschine/?channel=bing&device=c&network=o&campaign=407187550&adgroup=1358996907401418&target=kwd-84937601223652:loc-72&ad=84937506582369&ad-extension=

有可能吗?

1 个答案:

答案 0 :(得分:0)

如所见 here 响应将返回负责返回内容的站点的 URL。这意味着对于您的示例,您可以执行以下操作。

url = 'https://r.search.yahoo.com/cbclk2/dWU9MURCNjczQ0UwNThBNDk4MyZ1dD0xNjE2ODAzMTA5MDE4JnVvPTg0OTM3NTA2NTgyMzY5Jmx0PTImcz0xJmVzPVdHbFZxQzRHUFNfemNveGNLaUgxVkpoX3lXV2N2WFhiQkRfZklRLS0-/RV=2/RE=1616831909/RO=10/RU=https%3a%2f%2fwww.bing.com%2faclick%3fld%3de8BWTO-5A13W9y2D2Aw39AjjVUCUyb98EJf6bSa7R7dGxGXelKfNh7KW94OonXABpN7Bo9YkZqB22Evk3cfTIpJi3aGEXXKJMtDqnaNUDUVcsehzFOYyr09GoYqUE-iUywRWeOnV4aeACKf4_YX6dE2BVZAbqkvWj4HQMqeB_Fl1KlwT1v%26u%3daHR0cHMlM2ElMmYlMmZ2ZXJnbGVpY2guZm9jdXMuZGUlMmZ3YXNjaG1hc2NoaW5lJTJmJTNmY2hhbm5lbCUzZGJpbmclMjZkZXZpY2UlM2RjJTI2bmV0d29yayUzZG8lMjZjYW1wYWlnbiUzZDQwNzE4NzU1MCUyNmFkZ3JvdXAlM2QxMzU4OTk2OTA3NDAxNDE4JTI2dGFyZ2V0JTNka3dkLTg0OTM3NjAxMjIzNjUyJTNhbG9jLTcyJTI2YWQlM2Q4NDkzNzUwNjU4MjM2OSUyNmFkLWV4dGVuc2lvbiUzZA%26rlid%3d0fc40f09a4b6109e9c726f57d193ec0e/RK=2/RS=3w4U9AT_OQyaVSF.6KLwzWuo_LU-;_ylc=cnQDMQ--?IG=0ac9439bcf3f4ec087000000005bf464'
response = requests.get(url)
print(response.url) ## this will give you 'https://vergleich.focus.de/waschmaschine/?channel=bing&device=c&network=o&campaign=407187550&adgroup=1358996907401418&target=kwd-84937601223652:loc-72&ad=84937506582369&ad-extension='