从Tableau抓取数据

时间:2020-04-25 05:50:08

标签: python web-scraping beautifulsoup

我必须将数据从表格工作簿抓取到csv文件中。 https://public.tableau.com/views/2020_04_06_COVID19_India/Dashboard_India_Cases?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=yes&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0

我尝试了以下操作,但没有输出。

main.py

import requests
from bs4 import BeautifulSoup


 r = requests.get("https://public.tableau.com/views/2020_04_06_COVID19_India/Dashboard_India_Cases?:embed=y&:showVizHome=no&:host_url=https%3A%2F%2Fpublic.tableau.com%2F&:embed_code_version=3&:tabs=no&:toolbar=yes&:animate_transition=yes&:display_static_image=no&:display_spinner=no&:display_overlay=yes&:display_count=yes&publish=yes&:loadOrderID=0")

     soup = BeautifulSoup(r.content, "html.parser")

     for td in soup.findAll("table"):

     for a in td.findAll("tr"):
      print(a.find('td'))

1 个答案:

答案 0 :(得分:2)

我制作了 this python tableau scraper library 来列出工作表并将数据导出到每个工作表的 Pandas 数据框中。例如,以下获取您要查找的表:

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/views/2020_04_06_COVID19_India/Dashboard_India_Cases"

ts = TS()
ts.loads(url)
dashboard = ts.getDashboard()

for t in dashboard.worksheets:
    #show worksheet name
    print(f"WORKSHEET NAME : {t.name}")
    #show dataframe for this worksheet
    print(t.data)

run this code on repl.it