I have scraped data from a website and entered it into an array using the code below:
def process_course_details(course_details)
details_array =[]
details_link = true
entry_link = true
details_info = {}
# Sets all data in hash
details_info[:url] = clean_link(course_details.search('div.coursedetails_programmeurl a'))
details_array.push(details_info)
print_details_info(details_info)
entry_link = course_details.search('ul.details_tabs').first
end
The code above stores the element being pulled as such:
<a href="http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/">View course details on provider's website</a>
But I'd like to clean the above to the below:
http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/
or failing that remove the apostrophe and have this:
<a href="http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/">View course details on providers website</a>`
答案 0 :(得分:0)
你可以像这样用Nokogiri提取href:
html = Nokogiri::HTML('<a href="http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/">View course details on provider\'s website</a>')
html.xpath("//a/@href").to_s # => "http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/"
答案 1 :(得分:0)
基于your comment:
当存储其他数据时,我已将数据删除到数据库中 提供错误并停止它。一旦我清理了撇号和 它不再是代码工作的数组的一部分而且是表 创建
db = SQLite3::Database.open('ahhh.sqlite3') db.execute "INSERT INTO aahah (uname, cname, duration, qualification, url, entry) VALUES ('#{@uni_name}', #{@course_name}', '#{@course_duration}', '#{@course_qual}', '#{@details_entry}', '#{@requirements}')"
您正在通过字符串插值插入值:
db.execute("INSERT INTO table_name (foo, bar) VALUES ('#{@foo}', '#{@bar}')")
显然,如果插值字符串包含撇号,则您的SQL字符串可能会变为无效。更糟糕的是,此代码很容易SQL injection。
相反,您应该使用参数标记并让SQLite gem处理转义:
db.execute("INSERT INTO table_name (foo, bar) VALUES (?, ?)", [@foo, @bar])
这允许您安全地插入撇号和其他特殊字符。