自动执行手动工作,如从excel向网站输入的图像所示,并将结果输出回excel

时间:2018-12-18 09:01:18

标签: javascript web-scraping

我需要自动执行此操作,我在Excel工作表中包含A列第1行中的数据,我想将第1行中的文本放入案例编号搜索框中的链接“ https://www.lacourt.org/casesummary/ui/index.aspx”中并将结果返回到第一列旁边的excel工作表

我正在尝试将结果搜索的状态添加到第1列旁边的excel工作表中

请让我知道是否可行,因为我已经尝试过使用cheerio和scrap.js的所有可能方式

1 个答案:

答案 0 :(得分:0)

希望这可以帮助您前进。这并不完美,因为我想清理一些东西,提高效率,但由于您已经尝试了几周,只是想为您找点东西。

您也必须进行一些更改,因为我不知道您的excel表包含什么。我也有它作为一个CSV文件。如果它是一个excel文件,我需要更改一些内容(小的更改)。请让我知道,因为我没有意识到您说的是卓越而不是csv,直到我辞职为止。

from selenium import webdriver
import bs4
import pandas as pd

url = "https://www.lacourt.org/casesummary/ui/index.aspx"

# the 2 lines below will read in your csv file and create a list from your column A. 
# You'll have to specifiy path and filename. And If it's a different column name, change that below too
# remove hash symbol on next two lines

#df = pd.read_csv('path/file.csv')
#case_numbers = df['column A'].tolist()

# I used this 1 element list to test. Just delete this line once you have the above 2 lines sorted out
case_numbers = ['16V00010']



results = pd.DataFrame()
for case_num in case_numbers:
    driver = webdriver.Chrome()
    driver.get(url)

    driver.find_element_by_name("CaseNumber").send_keys(case_num)
    driver.find_element_by_css_selector("input[type='submit'][value='SEARCH']").click()

    html = driver.page_source

    soup = bs4.BeautifulSoup(html,'html.parser')
    status = soup.find('b', string='Status:').parent.next_sibling

    temp_df = pd.DataFrame([[case_num, status]], columns = ['case_number','status'])
    results = results.append(temp_df).reset_index(drop = True)

    driver.close()

print (results)

# saves the results. I don't know what else is in your csv file, so you may need to alter the code a bit
# remove hash below
#results.to_csv('path/new_file.csv', index=False))

输出:

print (results)
  case_number            status
0    16V00010  Legacy Dismissal