我有以下代码正在尝试在https://github.com/strongloop/loopback-next/issues/5368#issuecomment-626233755页面上刮取主表。我需要在第二列和第四列中获得NORAD ID和启动日期。但是我无法通过其ID来获取BeutifulSoup来查找表。
import requests
from bs4 import BeautifulSoup
data = []
URL = 'https://www.n2yo.com/satellites/?c=52&srt=2&dir=1'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find("table", id="categoriestab")
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
print(data)
答案 0 :(得分:1)
要获取NORAD ID
和Launch date
,可以尝试:
import pandas as pd
url = "https://www.n2yo.com/satellites/?c=52&srt=2&dir=0"
df = pd.read_html(url)
data = df[2].drop(["Name", "Int'l Code", "Period[minutes]", "Action"], axis=1)
print(data)
输出将是:
答案 1 :(得分:0)
更改
//Here, you have to use the data type based extra for the value you've passed.
//As we have passed a string, we're using getStringExtra, same for int getIntExtra()
//It takes the key name as the input, key should exactly match in both the activities
val selectedCountry = intent.getStringExtra("SelectedCountry")
//Now, do whatever you want with this country name in the new activity.
到
soup = BeautifulSoup(page.content, 'html.parser')
答案 2 :(得分:0)
如果您打印汤并进行搜索,您将在输出中找不到所需的ID。这很可能意味着此页面是JavaScript呈现的。您可以研究使用PhantomJS或硒。我用硒来解决这样的问题。您将需要下载Chrome驱动程序:https://chromedriver.chromium.org/downloads。这是我使用的代码。
driver = webdriver.Chrome(executable_path=<YOUR PATH>, options=options)
driver.get('YOUR URL')
driver.implicitly_wait(1)
soup_file = BeautifulSoup(driver.page_source, 'html.parser')
这是将驱动程序设置为连接到url,等到其加载后,获取所有代码并将其放入BeautifulSoup对象。
希望这会有所帮助!