我想在https://www.akzonobel.com/nl/careers/vacatures/网站上查看和抓取工作列表。这个国家必须是荷兰"工作级别为"录入级别"。
我使用httparty发送POST请求,但它会不断返回最初的10个作业列表。正确的答案应该是3个职位列表。
这是我使用的代码:
require 'httparty'
require 'nokogiri'
@base_url = 'https://www.akzonobel.com'
url = "#{@base_url}/careers/vacatures/"
data = {
'ctl00$contentLeft$ctl01$ddlCountryExt' => 'NLD',
'ctl00$contentLeft$ctl01$ddlJobLevelExt' => 'ENTRY_LEVEL'
}
response = HTTParty.post("#{@base_url}/nl/careers/vacatures/", :body => data)
html = Nokogiri::HTML(response)
jobs = html.xpath('//h3//a')
jobs.each do |job|
puts job.text
end
puts jobs.size
返回:
Regional Demand Planner Nordeuropa (m,w)
Forecast Analyst - TiO2 Spend Area
PS Regional Manager APAC
Production leader
Engineering Administrator - Temporary
Procurement Manager EMEA
Business Analyst, Americas
HR Business Partner Supply Chain and R&D
AS Regional Manager
Business Information Manager
10
如何发送网站所需的表单数据以获得正确的响应?
更新
我尝试过以下方法:
require 'httparty'
require 'nokogiri'
@base_url = 'https://www.akzonobel.com'
url = "#{@base_url}/careers/vacatures/"
data = {
'ctl00$contentLeft$ctl01$ddlCountryExt' => 'NLD',
'ctl00$contentLeft$ctl01$ddlJobLevelExt' => 'ENTRY_LEVEL',
'ctl00$contentLeft$ctl01$ddlContinentExt' => 1,
'ctl00$contentLeft$ctl01$ddlRegionEx' => 4,
'ctl00$contentLeft$ctl01$ddlJobFamilyEx' => 45,
'ctl00$contentLeft$ctl01$ddlBusinessUnitExt' => 22,
'ctl00$contentLeft$ctl01$ddlJobLevelExt' => 1,
'ctl00$contentLeft$ctl01$ddlCountryExt' => 1,
}
response = HTTParty.post("#{@base_url}/nl/careers/vacatures/", :body => data)
html = Nokogiri::HTML(response)
jobs = html.xpath('//h3//a')
jobs.each do |job|
puts job.text
end
puts jobs.size
不幸的是结果完全相同。
更新2:
这是更新后的代码:
require 'httparty'
require 'nokogiri'
@base_url = 'https://www.akzonobel.com'
url = "#{@base_url}/careers/vacatures/"
data = {
'contentLeft_ctl01_ddlContinentExt' => 'C_EUROPE',
'contentLeft_ctl01_ddlCountryExt' => 'NLD',
'contentLeft_ctl01_ddlRegionExt' => 'Gelderland',
'contentLeft_ctl01_ddlRegionExt' => 'Limburg',
'contentLeft_ctl01_ddlRegionExt' => 'North Holland',
'contentLeft_ctl01_ddlRegionExt' => 'South Holland',
'contentLeft_ctl01_ddlJobFamilyExt' => 'General Management',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Integrated Supply Chain',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Sales & Marketing',
'contentLeft_ctl01_ddlJobFamilyExt' => 'RD&I',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Support',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Other',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl2_General Management',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Manufacturing',
'contentLeft_ctl01_ddlJobFamilyExt' => 'HSE',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Engineering',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Procurement',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Distribution & Logistics',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Sales',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Marketing',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl2_RD&I',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Finance',
'contentLeft_ctl01_ddlJobFamilyExt' => 'IM',
'contentLeft_ctl01_ddlJobFamilyExt' => 'HR',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Legal, IP & Compliance',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Facilities',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl2_Other',
'contentLeft_ctl01_ddlJobFamilyExt' => '80200000',
'contentLeft_ctl01_ddlJobFamilyExt' => '80300000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81900000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81100000',
'contentLeft_ctl01_ddlJobFamilyExt' => '82000000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81200000',
'contentLeft_ctl01_ddlJobFamilyExt' => '80700000',
'contentLeft_ctl01_ddlJobFamilyExt' => '80400000',
'contentLeft_ctl01_ddlJobFamilyExt' => '80500000',
'contentLeft_ctl01_ddlJobFamilyExt' => '80800000',
'contentLeft_ctl01_ddlJobFamilyExt' => '80900000',
'contentLeft_ctl01_ddlJobFamilyExt' => '82100000',
'contentLeft_ctl01_ddlJobFamilyExt' => '82200000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81010000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81020000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81030000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81040000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81300000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81410000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81420000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81430000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81600000',
'contentLeft_ctl01_ddlJobFamilyExt' => '81700000',
'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl3_Other',
'contentLeft_ctl01_ddlBusinessUnitExt' => '52000100',
'contentLeft_ctl01_ddlBusinessUnitExt' => '52000200',
'contentLeft_ctl01_ddlBusinessUnitExt' => '52000300',
'contentLeft_ctl01_ddlBusinessUnitExt' => '52000900',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000010',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000013',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000020',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000022',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000026',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000033',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000038',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000041',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000054',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000055',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000056',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000061',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000063',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000100',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000300',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000900',
'contentLeft_ctl01_ddlBusinessUnitExt' => '53000901',
'contentLeft_ctl01_ddlBusinessUnitExt' => '51000000',
'contentLeft_ctl01_ddlJobLevelExt' => 'ENTRY_LEVEL'
}
response = HTTParty.post("#{@base_url}/nl/careers/vacatures/", :body => data)
html = Nokogiri::HTML(response)
jobs = html.xpath('//h3//a')
jobs.each do |job|
puts job.text
end
puts jobs.size
给我与以前完全相同的结果。
答案 0 :(得分:0)
我认为可以通过将这段代码更改为仅输出job.text 3次的循环来解决问题。
所以改变这个,
jobs.each do |job|
puts job.text
end
到此,
for (i=0; i < 3; i++) {
puts job.text
}
答案 1 :(得分:-1)
在GUI中设置country / joblevel时会触发JavaScript调用。您必须明确地将所有下拉列表值(Continent
,Region
,Job Family
,Business Unit
)设置为在设置NLD / EntryLevel后给出的值:1分别为4,45,22。
另一件事是隐藏了真正的控件,使用Chrome Inspector查看。实际控件的id
看起来像是:
contentLeft_ctl01_ddlCountryExt
希望它有所帮助。