在for循环中更改webscrape的输入(python 3)

时间:2015-09-18 21:38:00

标签: python

我正在尝试一遍又一遍地使用相同的代码搜索三个不同的网站。想知道我如何使用三个不同的网站更改网站输入和Excel输出。

所以我会抓住列表中的每个网站,然后按照列表的顺序导出每个结果:1。)Sports.xlsx,Entertainment.xlsx,News.xlsx

websites ["https://news.google.com/news/section?topic=s","https://news.google.com/news/section?topic=e", "https://news.google.com/"

for x in websites:
     for wiki in wikis:
          website = requests.get(wiki)
          soup = BeautifulSoup(website.content, "lxml")
          text = ''.join([element.text for element in soup.body.find_all(lambda tag: tag != 'script', recursive=False)])
          new =  re.sub(r'[^a-zA-Z \n]','',text)

      import xlsxwriter
      if x == "https://news.google.com/news/section?topic=s"
             new.to_excel('sports.xlsx', index=False)
      elif x == "https://news.google.com/news/section?topic=e"
             new.to_excel('entertainment.xlsx', index=False)
      elif x == "https://news.google.com/"
             new.to_excel('news.xlsx', index=False)

1 个答案:

答案 0 :(得分:2)

只需将您的列表设为一组以下格式的元组:

websites = [ (link, file_object) ]

for link, file_object in websites: # Unpacks the tuple for each element in the list
    # open the link, then write in the website