清理刮了<a href=""> rails

时间:2016-06-23 11:40:26

标签: ruby-on-rails ruby sqlite gsub code-cleanup

I have scraped data from a website and entered it into an array using the code below:

  def process_course_details(course_details)
        details_array =[]
        details_link = true 
        entry_link = true

                details_info = {}
                # Sets all data in hash
                details_info[:url] = clean_link(course_details.search('div.coursedetails_programmeurl a'))
                details_array.push(details_info)
                print_details_info(details_info)


             entry_link = course_details.search('ul.details_tabs').first

     end

The code above stores the element being pulled as such:

<a href="http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/">View course details on provider's website</a>

But I'd like to clean the above to the below:

http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/

or failing that remove the apostrophe and have this:

<a href="http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/">View course details on providers website</a>`

2 个答案:

答案 0 :(得分:0)

你可以像这样用Nokogiri提取href:

html = Nokogiri::HTML('<a href="http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/">View course details on provider\'s website</a>')
html.xpath("//a/@href").to_s # => "http://www.abdn.ac.uk/study/courses/undergraduate/C8R1/"

答案 1 :(得分:0)

基于your comment

  

当存储其他数据时,我已将数据删除到数据库中   提供错误并停止它。一旦我清理了撇号和   它不再是代码工作的数组的一部分而且是表   创建

db = SQLite3::Database.open('ahhh.sqlite3')
db.execute "INSERT INTO aahah (uname, cname, duration, qualification, url, entry) VALUES ('#{@uni_name}', #{@course_name}', '#{@course_duration}', '#{@course_qual}', '#{@details_entry}', '#{@requirements}')"

您正在通过字符串插值插入值:

db.execute("INSERT INTO table_name (foo, bar) VALUES ('#{@foo}', '#{@bar}')")

显然,如果插值字符串包含撇号,则您的SQL字符串可能会变为无效。更糟糕的是,此代码很容易SQL injection

相反,您应该使用参数标记并让SQLite gem处理转义:

db.execute("INSERT INTO table_name (foo, bar) VALUES (?, ?)", [@foo, @bar])

这允许您安全地插入撇号和其他特殊字符。