我需要自动执行此操作,我在Excel工作表中包含A列第1行中的数据,我想将第1行中的文本放入案例编号搜索框中的链接“ https://www.lacourt.org/casesummary/ui/index.aspx”中并将结果返回到第一列旁边的excel工作表
我正在尝试将结果搜索的状态添加到第1列旁边的excel工作表中
请让我知道是否可行,因为我已经尝试过使用cheerio和scrap.js的所有可能方式
答案 0 :(得分:0)
希望这可以帮助您前进。这并不完美,因为我想清理一些东西,提高效率,但由于您已经尝试了几周,只是想为您找点东西。
您也必须进行一些更改,因为我不知道您的excel表包含什么。我也有它作为一个CSV文件。如果它是一个excel文件,我需要更改一些内容(小的更改)。请让我知道,因为我没有意识到您说的是卓越而不是csv,直到我辞职为止。
from selenium import webdriver
import bs4
import pandas as pd
url = "https://www.lacourt.org/casesummary/ui/index.aspx"
# the 2 lines below will read in your csv file and create a list from your column A.
# You'll have to specifiy path and filename. And If it's a different column name, change that below too
# remove hash symbol on next two lines
#df = pd.read_csv('path/file.csv')
#case_numbers = df['column A'].tolist()
# I used this 1 element list to test. Just delete this line once you have the above 2 lines sorted out
case_numbers = ['16V00010']
results = pd.DataFrame()
for case_num in case_numbers:
driver = webdriver.Chrome()
driver.get(url)
driver.find_element_by_name("CaseNumber").send_keys(case_num)
driver.find_element_by_css_selector("input[type='submit'][value='SEARCH']").click()
html = driver.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
status = soup.find('b', string='Status:').parent.next_sibling
temp_df = pd.DataFrame([[case_num, status]], columns = ['case_number','status'])
results = results.append(temp_df).reset_index(drop = True)
driver.close()
print (results)
# saves the results. I don't know what else is in your csv file, so you may need to alter the code a bit
# remove hash below
#results.to_csv('path/new_file.csv', index=False))
输出:
print (results)
case_number status
0 16V00010 Legacy Dismissal