您如何使用熊猫和精美汤在多个网页地址上抓取表格?

时间:2019-03-23 15:47:53

标签: python pandas web-scraping beautifulsoup

我想从网站上的表格中提取数据。该表格遍布165个网页,我想将其全部抄袭。我只能得到第一页。

我尝试过熊猫,Beautifulsoup,请求

offset = 0
teacher_list = []
while offset <= 4500:

calls_df, = 
pd.read_html("https://projects.newsday.com/databases/long- 
island/teacher-administrator-salaries-2017-2018/?offset=0" + 
str(offset), header=0, parse_dates=["Start date"])

    offset = offset + 1500
    print(calls_df)

    # calls_df = "https:" + calls_df
    collection_page = requests.get(calls_df)
    page_html = collection_page.text

    soup = BeautifulSoup(page_html, "html.parser")

    print(page_html)
    print(soup.prettify())


print(teacher_list)
offset = offset + 1500
print(teacher_list,calls_df.to_csv("calls.csv", index=False))

1 个答案:

答案 0 :(得分:0)

您可以使用step参数来增加您的网址

$category = $_POST["category"];
$query = "SELECT * FROM products WHERE category = '$category' ORDER BY rand()";
$result = mysqli_query($conn, $query);
$productsArray = mysqli_fetch_all($result, MYSQLI_ASSOC);