查找XML内容

时间:2014-07-17 10:32:24

标签: ruby xml-parsing nokogiri

我试图通过这样的nokogiri从文件中获取数据:

From: XXX <xxx@xxx.com> 
To: yyy@yyy.com 
Subject: Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. 

----- BEGIN PGP SIGNED MESSAGE ----- 
Hash: SHA1 

Dear yyy@yyy.com: 

Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. "Smalleye squaretail antenna codlet dartfish peacock flounder plaice, luminous hake oceanic flyingfish tiger shark, bramble shark, California halibut. Australian prowfish lake chub knifefish African lungfish; southern Dolly Varden pike conger. Gouramie glass catfish loosejaw, three-toothed puffer. Nase ridgehead featherfin knifefish Rattail gulper false brotula Atlantic eel zebra oto. Marlin mahi-mahi freshwater eel false brotula mojarra naked-back knifefish Steve fish bocaccio. Amago kanyu algae eater bullhead shark orangespine unicorn fish bangus, "Pacific cod zander banjo catfish half-gill pejerrey Indian mul." 
<? xml version = "1.0" encoding = "UTF-8"?> 
<Case> 
   <ID> 48456856568 </ ID> 
   <Status> Open </ Status> 
   <Severity> Normal </ Severity> 
</ Case> 
<Complainant> 
   <Entity> Sabertooth </ Entity> 
   <Contact> California halibut </ Contact> 
   <Address> Pacific cod zander banjo catfish half-gill pejerrey Indian mul. </ Address> 
   <phone> +1 (352) 584 8413 </ Phone> 
   <Email> Xxx@xxx.com </ Email> 
</ Complainant> 
<Service_Provider> 
   <Entity> Hammerjaw pompano </ Entity> 
   <Contact/> 
   <Address/> 
   <Phone/> 
   <Email> Yyy@yyy.com </ Email> 
</ Service_Provider> 
<Source> 
   <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
   <IP_Address> 158.01.52.23 </ IP_Address> 
   <Port> 8080 </ Port> 
   <Type> Browser </ Type> 
   <Protocol="IP"/> 
   <UserName/> 
   <Number_Files> 5 </ Number_Files> 
</ Source> 
<Content> 
   <Item> 
   <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
    <Title> Dolly Varden pike conger </ Title> 
    <FileName> Dolly Varden pike conger </ FileName> 
    <FileSize> 2143534544 </ FileSize> 
    <InfoHash> 67asdv6a6sdv7d7sfb3c32da79dcc9a6cdc70 </ InfoHash> 
   </ Item> 
</ Content> 
<History/> 
<Notes/> 
<Type Retraction="false"/> 
<Verification/> 
</ Infringement> 

----- BEGIN PGP SIGNATURE ----- 
Version: GnuPG 

0zjdfbkHGBVJKhdbvskjdvbhBHSDJvhbvEtqs/WYMcIAL1 +4 ufOjdvXiDLcN1PzM/QJ 
IIj9KCq + / PYuMU6fTd800EOcbRX43RgeX6Qrgu + MDdDbte + CwKZL2Q28IZ0Viv +8 
YItYXdgwhNnUO2QE7jn/g5KXn4v72QqpnsPJjWQVVD12 + h6DDUdaQHMsTdYyYIVD 
Jkc8dPDVTLutVnuK2HZ4wQWRoiIWIMsUzePUht0eWi7DJFOlC5NuwS + E6FuxtgFj 
IwJyCr/dLC/u6YtVCAb37UUSu7k3F5iD3hFTt1RyswK7HBDizV1CHIlc2diARfkL
CwRpYc/SlpZNgbAXaUzwHhtIQjCuRXQGsXtvDFke4CvM9nGe6Uk095yVOAKla1Y = 
= mVny 
----- END PGP SIGNATURE -----

我需要信息,例如发送方IP,在/ Source / IP_Address,电子邮件发件人,谁在地址/电子邮件,来自字段位于信件的开头,信件本身。如何使用Nokogiri在Ruby中实现它?

我试图获取数据IP地址如下:

def ip_address 
ip = Nokogiri :: XML ("mail / *. txt") 
ip.each {| node | 
p node.inner_xml if node.name == "IP_Address" 
} 

但我没有出去。有没有人知道如何从这种类型的文件中获取数据?

2 个答案:

答案 0 :(得分:0)

由于您似乎只是在寻找IP地址,我会忘记nokogiri:

puts $~[1] if s =~ /<IP_Address>\s*([\d.]+)\s*<\/\s*IP_Address>/m
假设文件内容已加载到s

就可以了

s = File.read(...)

希望它有所帮助。

UPD 要格式化XML:

xml = $~[1] if s =~ /(<\?\s*xml.*?Infringement>)/m

答案 1 :(得分:0)

Nokogiri不会解析邮件消息,所以你必须摆脱非XML内容:

message = 'From: XXX <xxx@xxx.com> 
To: yyy@yyy.com 
Subject: Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. 

----- BEGIN PGP SIGNED MESSAGE ----- 
Hash: SHA1 

Dear yyy@yyy.com: 

Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. "Smalleye squaretail antenna codlet dartfish peacock flounder plaice, luminous hake oceanic flyingfish tiger shark, bramble shark, California halibut. Australian prowfish lake chub knifefish African lungfish; southern Dolly Varden pike conger. Gouramie glass catfish loosejaw, three-toothed puffer. Nase ridgehead featherfin knifefish Rattail gulper false brotula Atlantic eel zebra oto. Marlin mahi-mahi freshwater eel false brotula mojarra naked-back knifefish Steve fish bocaccio. Amago kanyu algae eater bullhead shark orangespine unicorn fish bangus, "Pacific cod zander banjo catfish half-gill pejerrey Indian mul." 
<? xml version = "1.0" encoding = "UTF-8"?> 
<Case> 
   <ID> 48456856568 </ ID> 
   <Status> Open </ Status> 
   <Severity> Normal </ Severity> 
</ Case> 
<Complainant> 
   <Entity> Sabertooth </ Entity> 
   <Contact> California halibut </ Contact> 
   <Address> Pacific cod zander banjo catfish half-gill pejerrey Indian mul. </ Address> 
   <phone> +1 (352) 584 8413 </ Phone> 
   <Email> Xxx@xxx.com </ Email> 
</ Complainant> 
<Service_Provider> 
   <Entity> Hammerjaw pompano </ Entity> 
   <Contact/> 
   <Address/> 
   <Phone/> 
   <Email> Yyy@yyy.com </ Email> 
</ Service_Provider> 
<Source> 
   <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
   <IP_Address> 158.01.52.23 </ IP_Address> 
   <Port> 8080 </ Port> 
   <Type> Browser </ Type> 
   <Protocol="IP"/> 
   <UserName/> 
   <Number_Files> 5 </ Number_Files> 
</ Source> 
<Content> 
   <Item> 
   <TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp> 
    <Title> Dolly Varden pike conger </ Title> 
    <FileName> Dolly Varden pike conger </ FileName> 
    <FileSize> 2143534544 </ FileSize> 
    <InfoHash> 67asdv6a6sdv7d7sfb3c32da79dcc9a6cdc70 </ InfoHash> 
   </ Item> 
</ Content> 
<History/> 
<Notes/> 
<Type Retraction="false"/> 
<Verification/> 
</ Infringement> 

----- BEGIN PGP SIGNATURE ----- 
Version: GnuPG 

0zjdfbkHGBVJKhdbvskjdvbhBHSDJvhbvEtqs/WYMcIAL1 +4 ufOjdvXiDLcN1PzM/QJ 
IIj9KCq + / PYuMU6fTd800EOcbRX43RgeX6Qrgu + MDdDbte + CwKZL2Q28IZ0Viv +8 
YItYXdgwhNnUO2QE7jn/g5KXn4v72QqpnsPJjWQVVD12 + h6DDUdaQHMsTdYyYIVD 
Jkc8dPDVTLutVnuK2HZ4wQWRoiIWIMsUzePUht0eWi7DJFOlC5NuwS + E6FuxtgFj 
IwJyCr/dLC/u6YtVCAb37UUSu7k3F5iD3hFTt1RyswK7HBDizV1CHIlc2diARfkL
CwRpYc/SlpZNgbAXaUzwHhtIQjCuRXQGsXtvDFke4CvM9nGe6Uk095yVOAKla1Y = 
= mVny 
----- END PGP SIGNATURE -----
'

这是如何将消息分解为XML:

require 'nokogiri'
xml = message[/(<\? xml .+)----- BEGIN/m, 1]
doc = Nokogiri::XML::DocumentFragment.parse(xml)
doc.at('IP_Address').text # => " 158.01.52.23 "

神奇的部分是:

xml = message[/(<\? xml .+)----- BEGIN/m, 1]

抓取从<? xml----- BEGIN之前的行的所有内容。然后Nokogiri::XML::DocumentFragment.parse可以创建一个可搜索的DOM。