使用htmlentities gem时,Rails在UTF-8中无效字节序列

时间:2018-02-18 17:47:20

标签: ruby-on-rails ruby-on-rails-5 nokogiri html-entities

所以我让控制器抓取整个页面的html并将其存储到mysql数据库中。在我存储数据之前,我想使用htmlentities gem对其进行编码。我的问题是,对于某些网站,它可以正常运行,例如https://www.lookagain.co.uk/,但是对于其他网站,我得到invalid byte sequence in UTF-8,例如https://www.google.co.uk/,我不知道为什么。起初我虽然数据库可能有问题所以我已将所有字段更改为LONGTEXT但问题仍然存在

控制器:

class PageScraperController < ApplicationController
    require 'nokogiri'
    require 'open-uri'
    require 'diffy'
    require 'htmlentities'

    def scrape

        @url = watched_link_params[:url].to_s
        puts "LOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOG#{@url}"
        @page = Nokogiri::HTML(open(@url))
        coder = HTMLEntities.new
        @encodedHTML = coder.encode(@page)
        create

     end

     def index      
        @savedHTML = ScrapedPage.all
      end

      def show
        @savedHTML = ScrapedPage.find(id)

      end

      def new
        @savedHTML = ScrapedPage.new


      end

      def create

        @savedHTML = ScrapedPage.create(domain: @url, html: @encodedHTML, css: '', javascript: '')

        if @savedHTML.save

          puts "ADDED TO THE DATABASE"

          redirect_to(root_path)
        else

          puts "FAILED TO ADD TO THE DATABASE"

        end
      end

      def edit

      end

      def upadate

      end

      def delete
        @watched_links = ScrapedPage.find(params[:id])
      end

      def destroy
        @watched_links = ScrapedPage.find(params[:id])
        @watched_links.destroy
        redirect_to(root_path)
      end

    def watched_link_params

        params.require(:default).permit(:url)

    end

end

0 个答案:

没有答案