如何使用Mechanize刮取电子邮件地址,而不是将“[email protected]”作为值

时间:2013-12-09 23:44:32

标签: ruby nokogiri mechanize cloudflare

在为客户的电子邮件地址抓取单个订单(可在此处找到完整的HTML代码:http://pastebin.com/SaLc5jHu)页面(我的OpenCart商店的管理员部分)时,我会将以下内容作为电子邮件地址值:

[email protected]
/* <![CDATA[ */
(function(){try{var s,a,i,j,r,c,l,b=document.getElementsByTagName("script");l=b[b.length-1].previousSibling;a=l.getAttribute('data-cfemail');if(a){s='';r=parseInt(a.substr(0,2),16);for(j=2;a.length-j;j+=2){c=parseInt(a.substr(j,2),16)^r;s+=String.fromCharCode(c);}s=document.createTextNode(s);l.parentNode.replaceChild(s,l);}}catch(e){}})();
/* ]]> */

以下是代码:

require 'mechanize'

a = Mechanize.new

a.get('http://exampleshop.nl/admin/') do |page|

    # Select the login form
    login_form = page.forms.first

    # Insert the username and password
    login_form.username = 'username'
    login_form.password = 'password'

    # Submit the login information
    dashboard_page = a.submit(login_form, login_form.buttons.first)

    # Check if the login was successfull
    puts check_1 = dashboard_page.title == 'Dashboard' ?  "CHECK 1 DASHBOARD SUCCESS" : "CHECK 1 DASHBOARD FAIL"

    # Visit the orders index page to scrape some standard information
    orders_page = a.click(dashboard_page.link_with(:text => /Bestellingen/))

    # pp orders_page # => http://pastebin.com/L3zASer6

    # Check if the visit is successful
    puts check_2 = orders_page.title == 'Bestellingen' ?  "CHECK 2 ORDERS SUCCESS" : "CHECK 2 ORDERS FAIL"

    # Search for all #singleOrder table row's and put them in variable all_single_orders
    all_single_orders = orders_page.search("#singleOrder") 

    # Scrape the needed information (the actual save to database is omitted)
    all_single_orders.each do |order|
        # Set links for each order
        order_link = order.at_css("a")['href']  #Assuming first link in row

        order_id = order.search("#orderId").text                    # => 259    
        order_status = order.search("#orderStatus").text    # => Bestelling ontvangen           
        order_amount = order.search("#orderAmount").text        # => € 41,94

        # Visit a single order page to fetch more detailed information
        single_order_page = orders_page.link_with(:href => order_link).click

        # Fetch more information
        puts first_name = single_order_page.search(".firstName").text
        puts last_name = single_order_page.search(".lastName").text
        puts email = single_order_page.search(".email").text # => [email protected] /* <![CDATA[ */...
        puts postal_code = single_order_page.search(".postalCode").text
        puts address = single_order_page.search(".address").text
        puts product_quantity = single_order_page.search(".orderQuantity").text
    end
end

有什么想法吗?我正在使用Ruby 2.0.0和Mechanize 2.7.3并且设置了CloudFlare。

更新

现在工作。要实现此功能,只需在CloudFlare的“应用”面板(https://www.cloudflare.com/cloudflare-apps)中禁用ScrapeShield电子邮件模糊处理选项。

1 个答案:

答案 0 :(得分:0)

由于名为ScrapeShield的CloudFlare应用程序已激活,因此无效。

要实现此功能,只需停用“应用”面板(https://www.cloudflare.com/cloudflare-apps)中的ScrapeShield E-mail obfuscation选项。