如何使用Selenium将元素文本提取到Pandas DataFrame中

时间:2018-12-27 11:54:04

标签: python pandas selenium web-scraping

我正在为https://www.nytimes.com/section/politics进行一些抓取, 页面看起来像这样:

enter image description here 到目前为止,我的代码是这样的:

Dates = driver.find_elements_by_class_name("css-umh681")
len(Dates)
Date_M=[]
for Date in Dates:
    print(Date.text)
    Date_M.append(Date.text)

Date_M

HeadLines=driver.find_elements_by_class_name("css-1dq8tca")
len(HeadLines)
HeadLine_M=[]
for HeadLine in HeadLines:
    print(HeadLine.text)
    HeadLine_M.append(HeadLine.text)
HeadLine_M

如何将所选元素的文本提取到数据框中以获取此信息: enter image description here

1 个答案:

答案 0 :(得分:1)

尝试

driver = webdriver.Chrome('chromedriver.exe')

driver.get('https://www.nytimes.com/section/politics')

class_ele = driver.find_element_by_class_name('css-13mho3u')

pos= 0
df = pd.DataFrame(columns=['Date','Headline'])

for ol in class_ele.find_elements_by_class_name('css-ye6x8s'):
    data = []
    h2 = ol.find_element_by_class_name('css-1dq8tca').text
    div_2 = ol.find_element_by_class_name('css-umh681').text
    data.append(div_2)
    data.append(h2)
    df.loc[pos] = data
    pos+=1

print(df)

           Date                                           Headline
0  Dec 27, 2018  LinkedIn Co-Founder Apologizes for Deception i...
1  Dec 27, 2018  Trump in Iraq: First Visit to U.S. Troops in C...
2  Dec 27, 2018  Federal Workers, Some in ‘Panic Mode,’ Share S...
3  Dec 26, 2018  Did a Queens Podiatrist Help Donald Trump Avoi...
4  Dec 26, 2018                   Donald Trump’s Registration Card
5  Dec 26, 2018           Donald Trump’s Selective Service Records
6  Dec 26, 2018  Arms Sales to Saudis Leave American Fingerprin...
7  Dec 26, 2018  Black Voters, a Force in Democratic Politics, ...
8  Dec 25, 2018  How Did Rifles With an American Stamp End Up i...
9  Dec 25, 2018  Kids, Please Don’t Read This Article on What T...