我正在尝试从以下动态网站检索表格并将其保存到数据框中: https://www.grants.gov/web/grants/search-grants.html
我尝试了一些方法,例如 pandas、requests.post、beautifulSoup 和 selenium,它们都没有返回结果,就好像表不存在或根本没有检测到一样。
这是我的代码如下:
from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup
import requests
#using pandas
pd.read_html('https://www.grants.gov/web/grants/search-grants.html')
# Using beautifulsoup
URL='https://www.grants.gov/web/grants/search-grants.html'
response = requests.get(URL, headers={})
soup = BeautifulSoup(response.text, 'lxml')
print(soup)
job_elems = soup.findAll('table')
print(job_elems)
for i in job_elems:
txt=i.find("td").text.strip()
print(txt)
tr=soup.findAll("tr",class_='gridevenrow')
for element in tr:
row=element.find('td')
print(row.text)
#using selenium
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
driver=webdriver.Firefox(executable_path ='/Users/**/geckodriver',options=options)
driver.get(URL)
elems = driver.find_elements_by_xpath("//td")
for e in elems:
print(e.text)
#requests.post
url= "https://www.grants.gov/grantsws/rest/opportunities/search/"
data = """{"startRecordNum":0,"sortBy":"openDate|desc","oppStatuses":"forecasted|posted"}"""
soup = BeautifulSoup(requests.post(url, data=data).content, "xml")
data = []
for sn in soup.findAll("tr"):
text=sn.find('td').text
print(text)
#selenium + soup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
from selenium.webdriver.firefox.options import Options
options = Options()
options.headless = True
driver = webdriver.Firefox(executable_path ='/Users/**/geckodriver',options=options)
driver.get('https://www.grants.gov/grantsws/rest/opportunities/search/')
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//tr"))) #waits 10 seconds until element is located. Can have other wait conditions such as visibility_of_element_located or text_to_be_present_in_element
html = driver.page_source
soup = bs(html, "lxml")
dynamic_text = soup.find_all("td") #or other attributes, optional
print(dynamic_text)
答案 0 :(得分:1)
您看到的数据是从外部 URL 加载的。您可以使用此示例如何将其加载到 Pandas DataFrame:
import json
import requests
import pandas as pd
url = "https://www.grants.gov/grantsws/rest/opportunities/search/"
payload = {
"oppStatuses": "forecasted|posted",
"sortBy": "openDate|desc",
"startRecordNum": 0,
}
data = requests.post(url, json=payload).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
df = pd.DataFrame(data["oppHits"])
df["cfdaList"] = df["cfdaList"].apply(lambda x: ", ".join(x))
print(df)
df.to_csv("data.csv", index=False)
打印:
id number title agencyCode agency openDate closeDate oppStatus docType cfdaList
0 333348 72062421RFA00003 SERVIR West Africa 2 USAID-GHA Ghana USAID-Accra 05/05/2021 05/10/2021 posted synopsis 98.001
1 333281 USDA-AMS-TM-RFSP-G-21-0009 Regional Food System Partnerships USDA-AMS Agricultural Marketing Service 05/05/2021 07/06/2021 posted synopsis 10.177
2 333336 72038821RFI00002 USAID/Bangladesh Request for Information on US... USAID-BAN Bangladesh USAID-Dhaka 05/05/2021 05/27/2021 posted synopsis 98.001
3 333307 N00014-21-S-SN10 2021 Office of Naval Research (ONR) Global Res... DOD-ONR Office of Naval Research 05/05/2021 07/09/2021 posted synopsis 12.300
4 333337 SCAISB-21-AW-015-05052021 Promoting a Culture of Inclusion and Research DOS-PAK U.S. Mission to Pakistan 05/05/2021 06/04/2021 posted synopsis 19.501
5 333338 030ADV21R0179 Teaching with Primary Sources LOC Library of Congress 05/05/2021 05/28/2021 posted synopsis 42.010
6 333343 RFI-675-21-HFECA-01 Guinea Health Facility Electrification and Con... USAID-GUI Guinea USAID-Conakry 05/05/2021 06/28/2021 posted synopsis 98.001
7 333342 O-BJA-2021-93001 BJA FY 21 Second Chance Act Pay for Success In... USDOJ-OJP-BJA Bureau of Justice Assistance 05/05/2021 06/22/2021 posted synopsis 16.812
8 333308 O-BJA-2021-04001 BJA FY 21 Sexual Assault Forensic Evidence - I... USDOJ-OJP-BJA Bureau of Justice Assistance 05/05/2021 06/07/2021 posted synopsis 16.741
9 333344 O-BJA-2021-94002 BJA FY 21 Safeguarding Correctional Facilities... USDOJ-OJP-BJA Bureau of Justice Assistance 05/05/2021 06/07/2021 posted synopsis 16.844
10 333313 DE-FOA-0002527 Equitable Access to Community-based Solar DOE-GFO Golden Field Office 05/05/2021 06/01/2021 posted synopsis 81.117
11 333310 SFOP0008106 Tunisia Supporting the Inclusion of Vulnerable... DOS-NEA-AC Assistance Coordination 05/05/2021 06/01/2021 posted synopsis 19.600
12 333358 L21AS00499 Department of the Interior - Bureau of Land Ma... DOI-BLM Bureau of Land Management 05/05/2021 06/04/2021 posted synopsis 15.224
13 333352 PAR-21-224 NeuroNEXT Small Business Innovation in Clinica... HHS-NIH11 National Institutes of Health 05/05/2021 04/05/2024 posted synopsis 93.853
14 333353 RFA-HL-23-004 NHLBI Outstanding Investigator Award (OIA) (R3... HHS-NIH11 National Institutes of Health 05/05/2021 04/25/2024 posted synopsis 93.840, 93.233, 93.838, 93.839, 93.837
15 333311 RFA-FD-21-032 Integrated Pathogen Reduction Technologies for... HHS-FDA Food and Drug Administration 05/05/2021 07/06/2021 posted synopsis 93.103
16 333312 DE-FOA-0002526 Workforce Development Strategies Supporting th... DOE-GFO Golden Field Office 05/05/2021 06/01/2021 posted synopsis 81.117
17 333315 RFA-HL-23-005 NHLBI Emerging Investigator Award (EIA) (R35 C... HHS-NIH11 National Institutes of Health 05/05/2021 04/25/2024 posted synopsis 93.840, 93.233, 93.838, 93.839, 93.837
18 333351 PAR-21-223 NeuroNEXT Clinical Trials (U01 Clinical Trial ... HHS-NIH11 National Institutes of Health 05/05/2021 03/05/2024 posted synopsis 93.853
19 333350 O-OJJDP-2021-00002 OJJDP FY 2021 Strategies To Support Children E... USDOJ-OJP-OJJDP Office of Juvenile Justice Delinquency Prevent... 05/05/2021 06/22/2021 posted synopsis 16.818
20 333346 RFA-HD-22-020 Human Milk as a Biological System (R01 Clinica... HHS-NIH11 National Institutes of Health 05/05/2021 11/29/2021 posted synopsis 93.865
21 333349 O-OJJDP-2021-92009 OJJDP FY 2021 Family Drug Court Program USDOJ-OJP-OJJDP Office of Juvenile Justice Delinquency Prevent... 05/05/2021 06/22/2021 posted synopsis 16.585
22 333354 21CS16 Women’s Risk and Need Assessment (WRNA) USDOJ-BOP-NIC National Institute of Corrections 05/05/2021 07/05/2021 posted synopsis 16.601
23 329057 HHS-2021-ACF-ACYF-EV-1942 Family Violence Prevention and Services Discre... HHS-ACF-FYSB Administration for Children & Families - ACYF/... 05/05/2021 07/05/2021 posted synopsis 93.592
24 333269 HHS-2021-ACF-OPRE-YR-1967 Head Start University Partnerships: Building ... HHS-ACF-OPRE Administration for Children and Families - OPRE 05/05/2021 07/06/2021 posted synopsis 93.600
并保存 data.csv
(来自 LibreOffice 的屏幕截图):