将Selenium数据结果传递给Pandas

时间:2017-08-12 17:26:40

标签: python selenium-webdriver web-scraping

我正在尝试自动执行返回信息表的搜索。我能够在.text中打印结果,但我的问题是如何将结果传递给Pandas数据帧。我问的原因是双重的;因为我想将结果打印到CSV文件中,我需要Pandas中的结果以便稍后进行数据分析。感谢是否有人可以提供帮助。我的代码如下:

import time
from selenium import webdriver
import pandas as pd


search = ['0501020210597400','0501020210597500','0501020210597600']
df = pd.DataFrame(search)


chrome_path = [Chrome Path]
driver = webdriver.Chrome(chrome_path)

driver.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/')
x = 0

while x <(len(df.index)):
    search_box = driver.find_element_by_name('sel_value')
    new_line = (df[0][x]).format(x)
    search_box.send_keys(new_line)
    search_box.submit()
    time.sleep(5)
    table = driver.find_elements_by_class_name('tr-body')
    for data in table:
        print(data.text)
        driver.find_element_by_name('sel_value').clear()
    x +=1

driver.close()

2 个答案:

答案 0 :(得分:1)

要将CSV文件加载到DataFrame,您可以执行以下操作:

responseData[0].offer_name

请参阅在线文档:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv

要将数据写入CSV,请参阅此文章:Pandas writing dataframe to CSV file on SO。

解决方案是:

this.iscroll = new IScroll(this.iscrollEl, {
    scrollX: true,
    scrollY: false,
    mouseWheel: false,
    disablePointer: true,
    disableTouch: false,
    disableMouse: false
});

答案 1 :(得分:1)

您可以使用请求并执行POST来获取信息,而不是使用硒

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

search = ['0501020210597400','0501020210597500','0501020210597600']
headers = {'Referer' : 'https://enquiry.mpsj.gov.my/v2/service/cuk_search/1',
          'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
          }
output = []
dfHeaders = ['No.', 'No. Akaun', 'Nama Di Bil', 'Jumlah Perlu Dibayar', '']

with requests.Session() as s:      
    for item in search:
        r = s.get('https://enquiry.mpsj.gov.my/v2/service/cuk_search/1', headers = headers)
        soup = bs(r.content, 'lxml')
        key = soup.select_one('[name=ACCESS_KEY]')['value']
        body = {'sel_input': 'no_akaun', 'sel_value': item, 'ACCESS_KEY': key}
        res = s.post('https://enquiry.mpsj.gov.my/v2/service/cuk_search_submit/', data = body)
        soup = bs(res.content, 'lxml')
        table = soup.select_one('.tbl-list')
        rows = table.select('.tr-body')

        for row in rows:
            cols = row.find_all('td')
            cols = [item.text.strip() for item in cols]
            output.append([item for item in cols if item])

df = pd.DataFrame(output, columns = dfHeaders)
print(df)
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )