表未正确刮取Python BeautifulSoup

时间:2020-06-20 05:27:23

标签: python beautifulsoup

我有以下代码正在尝试在https://github.com/strongloop/loopback-next/issues/5368#issuecomment-626233755页面上刮取主表。我需要在第二列和第四列中获得NORAD ID和启动日期。但是我无法通过其ID来获取BeutifulSoup来查找表。

import requests
from bs4 import BeautifulSoup

data = []

URL = 'https://www.n2yo.com/satellites/?c=52&srt=2&dir=1'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find("table", id="categoriestab")
rows = table.find_all('tr')

for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

print(data)

3 个答案:

答案 0 :(得分:1)

要获取NORAD IDLaunch date,可以尝试:

import pandas as pd

url = "https://www.n2yo.com/satellites/?c=52&srt=2&dir=0"
df = pd.read_html(url)

data = df[2].drop(["Name", "Int'l Code", "Period[minutes]", "Action"], axis=1)
print(data)

输出将是:

enter image description here

答案 1 :(得分:0)

更改

//Here, you have to use the data type based extra for the value you've passed.
//As we have passed a string, we're using getStringExtra, same for int getIntExtra()
//It takes the key name as the input, key should exactly match in both the activities
val selectedCountry = intent.getStringExtra("SelectedCountry")
//Now, do whatever you want with this country name in the new activity.

soup = BeautifulSoup(page.content, 'html.parser')

答案 2 :(得分:0)

如果您打印汤并进行搜索,您将在输出中找不到所需的ID。这很可能意味着此页面是JavaScript呈现的。您可以研究使用PhantomJS或硒。我用硒来解决这样的问题。您将需要下载Chrome驱动程序:https://chromedriver.chromium.org/downloads。这是我使用的代码。

driver = webdriver.Chrome(executable_path=<YOUR PATH>, options=options)
driver.get('YOUR URL')
driver.implicitly_wait(1)
soup_file = BeautifulSoup(driver.page_source, 'html.parser')

这是将驱动程序设置为连接到url,等到其加载后,获取所有代码并将其放入BeautifulSoup对象。

希望这会有所帮助!