您好我刚刚完成了以下教程:https://github.com/ryandhaase/Web-Scraper/blob/master/airbnb_scraper.rb和https://medium.com/@tabor_francesca/web-scraper-airbnb-24d67939b08a#.mg7ny2tke。而我现在正在练习。我在拆分子阵列时遇到问题。一切正常,但我无法将城市,州和邮政编码拆分为单独的Excel列。
以下行不正确,我该如何解决?
city << [subarray[0], "this is not working", subarray[1]]
我的猜测还有另一条线需要修复。
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
url = "https://www.tesla.com/findus/list/stores/United+States"
page = Nokogiri::HTML(open(url))
page = Nokogiri::HTML(open("https://www.tesla.com/findus/list/stores/United+States"))
puts page.class
name = []
street_address = []
extended_address = []
city = []
state = []
zip = []
page.css('a.fn.org.url').each do |line|
name << line.text.strip
end
page.css('span.street-address').each do |line|
street_address << line.text
end
page.css('span.extended-address').each do |line|
extended_address << line.text
end
page.css('span.locality').each do |line|
subarray = line.text.strip.split(/ · /)
if subarray.length == 3
city << subarray
else
city << [subarray[0], "this is not working", subarray[1]]
end
end
CSV.open("teslaStores.csv", "w") do |file|
file << ["Name", "Street Address", "Street Address Continued", "City", "State", "Zip"]
name.length.times do |i|
file << [name[i], street_address[i], extended_address[i], city[i], city[i][0], city[i][1]]
end
end
答案 0 :(得分:0)
就像仅供参考,这是未经测试的,但Ruby中的惯用代码更多:
require 'csv'
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open('https://www.tesla.com/findus/list/stores/United+States'))
name = page.css('a.fn.org.url').map{ |n| n.text.strip }
street_address = page.css('span.street-address').map { |n| n.text }
extended_address = page.css('span.extended-address').map{ |n| n.text }
city = page.css('span.locality').map { |n|
subarray = n.text.strip.split(/ · /)
if subarray.length == 3
subarray
else
[subarray[0], 'this is not working', subarray[1]]
end
}
CSV.open('teslaStores.csv', 'w') do |file|
file << ['Name', 'Street Address', 'Street Address Continued', 'City', 'State', 'Zip']
name.length.times do |i|
file << [name[i], street_address[i], extended_address[i], city[i], city[i][0], city[i][1]]
end
end
这可以进一步减少:
street_address, extended_address = [
'span.street-address',
'span.extended-address'
].map{ |selector|
page.css(selector).map { |n| n.text }
}
答案 1 :(得分:0)
所以,我参加了一个关于python的meetup.com活动,并询问其中一条说明是否有帮助,即使该课程不在这个主题上:)。老师解释说我需要用逗号和空格分开。在我分裂之前的那段时间。
我不得不改变这个:
page.css('span.locality').each do |line|
subarray = line.text.strip.split(/ · /)
if subarray.length == 3
city << subarray
else
city << [subarray[0], "this is not working", subarray[1]]
end
对此:
page.css('span.locality').each do |line|
subarray = line.text.strip.split(',')
subarray2 = subarray[1].split(' ')
city << subarray[0]
state << subarray2[0]
zip << subarray2[1]
end
以下是完整的答案:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
url = "https://www.tesla.com/findus/list/stores/United+States"
page = Nokogiri::HTML(open(url))
page = Nokogiri::HTML(open("https://www.tesla.com/findus/list/stores/United+States"))
puts page.class
name = []
street_address = []
extended_address = []
city = []
state = []
zip = []
page.css('a.fn.org.url').each do |line|
name << line.text.strip
end
page.css('span.street-address').each do |line|
street_address << line.text
end
page.css('span.extended-address').each do |line|
extended_address << line.text
end
page.css('span.locality').each do |line|
subarray = line.text.strip.split(',')
subarray2 = subarray[1].split(' ')
city << subarray[0]
state << subarray2[0]
zip << subarray2[1]
end
CSV.open("teslaStores.csv", "w") do |file|
file << ["Name", "Street Address", "Street Address Continued", "City", "State", "Zip"]
name.length.times do |i|
file << [name[i], street_address[i], extended_address[i], city[i], state[i], zip[i]]
end
end