我试图通过这样的nokogiri从文件中获取数据:
From: XXX <xxx@xxx.com>
To: yyy@yyy.com
Subject: Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish.
----- BEGIN PGP SIGNED MESSAGE -----
Hash: SHA1
Dear yyy@yyy.com:
Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. "Smalleye squaretail antenna codlet dartfish peacock flounder plaice, luminous hake oceanic flyingfish tiger shark, bramble shark, California halibut. Australian prowfish lake chub knifefish African lungfish; southern Dolly Varden pike conger. Gouramie glass catfish loosejaw, three-toothed puffer. Nase ridgehead featherfin knifefish Rattail gulper false brotula Atlantic eel zebra oto. Marlin mahi-mahi freshwater eel false brotula mojarra naked-back knifefish Steve fish bocaccio. Amago kanyu algae eater bullhead shark orangespine unicorn fish bangus, "Pacific cod zander banjo catfish half-gill pejerrey Indian mul."
<? xml version = "1.0" encoding = "UTF-8"?>
<Case>
<ID> 48456856568 </ ID>
<Status> Open </ Status>
<Severity> Normal </ Severity>
</ Case>
<Complainant>
<Entity> Sabertooth </ Entity>
<Contact> California halibut </ Contact>
<Address> Pacific cod zander banjo catfish half-gill pejerrey Indian mul. </ Address>
<phone> +1 (352) 584 8413 </ Phone>
<Email> Xxx@xxx.com </ Email>
</ Complainant>
<Service_Provider>
<Entity> Hammerjaw pompano </ Entity>
<Contact/>
<Address/>
<Phone/>
<Email> Yyy@yyy.com </ Email>
</ Service_Provider>
<Source>
<TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp>
<IP_Address> 158.01.52.23 </ IP_Address>
<Port> 8080 </ Port>
<Type> Browser </ Type>
<Protocol="IP"/>
<UserName/>
<Number_Files> 5 </ Number_Files>
</ Source>
<Content>
<Item>
<TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp>
<Title> Dolly Varden pike conger </ Title>
<FileName> Dolly Varden pike conger </ FileName>
<FileSize> 2143534544 </ FileSize>
<InfoHash> 67asdv6a6sdv7d7sfb3c32da79dcc9a6cdc70 </ InfoHash>
</ Item>
</ Content>
<History/>
<Notes/>
<Type Retraction="false"/>
<Verification/>
</ Infringement>
----- BEGIN PGP SIGNATURE -----
Version: GnuPG
0zjdfbkHGBVJKhdbvskjdvbhBHSDJvhbvEtqs/WYMcIAL1 +4 ufOjdvXiDLcN1PzM/QJ
IIj9KCq + / PYuMU6fTd800EOcbRX43RgeX6Qrgu + MDdDbte + CwKZL2Q28IZ0Viv +8
YItYXdgwhNnUO2QE7jn/g5KXn4v72QqpnsPJjWQVVD12 + h6DDUdaQHMsTdYyYIVD
Jkc8dPDVTLutVnuK2HZ4wQWRoiIWIMsUzePUht0eWi7DJFOlC5NuwS + E6FuxtgFj
IwJyCr/dLC/u6YtVCAb37UUSu7k3F5iD3hFTt1RyswK7HBDizV1CHIlc2diARfkL
CwRpYc/SlpZNgbAXaUzwHhtIQjCuRXQGsXtvDFke4CvM9nGe6Uk095yVOAKla1Y =
= mVny
----- END PGP SIGNATURE -----
我需要信息,例如发送方IP,在/ Source / IP_Address,电子邮件发件人,谁在地址/电子邮件,来自字段位于信件的开头,信件本身。如何使用Nokogiri在Ruby中实现它?
我试图获取数据IP地址如下:
def ip_address
ip = Nokogiri :: XML ("mail / *. txt")
ip.each {| node |
p node.inner_xml if node.name == "IP_Address"
}
但我没有出去。有没有人知道如何从这种类型的文件中获取数据?
答案 0 :(得分:0)
由于您似乎只是在寻找IP地址,我会忘记nokogiri:
puts $~[1] if s =~ /<IP_Address>\s*([\d.]+)\s*<\/\s*IP_Address>/m
假设文件内容已加载到s
,就可以了
s = File.read(...)
希望它有所帮助。
UPD 要格式化XML:
xml = $~[1] if s =~ /(<\?\s*xml.*?Infringement>)/m
答案 1 :(得分:0)
Nokogiri不会解析邮件消息,所以你必须摆脱非XML内容:
message = 'From: XXX <xxx@xxx.com>
To: yyy@yyy.com
Subject: Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish.
----- BEGIN PGP SIGNED MESSAGE -----
Hash: SHA1
Dear yyy@yyy.com:
Sabertooth; zebra oto Hammerjaw pompano, cusk-eel lighthousefish frogmouth catfish. "Smalleye squaretail antenna codlet dartfish peacock flounder plaice, luminous hake oceanic flyingfish tiger shark, bramble shark, California halibut. Australian prowfish lake chub knifefish African lungfish; southern Dolly Varden pike conger. Gouramie glass catfish loosejaw, three-toothed puffer. Nase ridgehead featherfin knifefish Rattail gulper false brotula Atlantic eel zebra oto. Marlin mahi-mahi freshwater eel false brotula mojarra naked-back knifefish Steve fish bocaccio. Amago kanyu algae eater bullhead shark orangespine unicorn fish bangus, "Pacific cod zander banjo catfish half-gill pejerrey Indian mul."
<? xml version = "1.0" encoding = "UTF-8"?>
<Case>
<ID> 48456856568 </ ID>
<Status> Open </ Status>
<Severity> Normal </ Severity>
</ Case>
<Complainant>
<Entity> Sabertooth </ Entity>
<Contact> California halibut </ Contact>
<Address> Pacific cod zander banjo catfish half-gill pejerrey Indian mul. </ Address>
<phone> +1 (352) 584 8413 </ Phone>
<Email> Xxx@xxx.com </ Email>
</ Complainant>
<Service_Provider>
<Entity> Hammerjaw pompano </ Entity>
<Contact/>
<Address/>
<Phone/>
<Email> Yyy@yyy.com </ Email>
</ Service_Provider>
<Source>
<TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp>
<IP_Address> 158.01.52.23 </ IP_Address>
<Port> 8080 </ Port>
<Type> Browser </ Type>
<Protocol="IP"/>
<UserName/>
<Number_Files> 5 </ Number_Files>
</ Source>
<Content>
<Item>
<TimeStamp> 2012-12-30T14: 24:05 Z </ TimeStamp>
<Title> Dolly Varden pike conger </ Title>
<FileName> Dolly Varden pike conger </ FileName>
<FileSize> 2143534544 </ FileSize>
<InfoHash> 67asdv6a6sdv7d7sfb3c32da79dcc9a6cdc70 </ InfoHash>
</ Item>
</ Content>
<History/>
<Notes/>
<Type Retraction="false"/>
<Verification/>
</ Infringement>
----- BEGIN PGP SIGNATURE -----
Version: GnuPG
0zjdfbkHGBVJKhdbvskjdvbhBHSDJvhbvEtqs/WYMcIAL1 +4 ufOjdvXiDLcN1PzM/QJ
IIj9KCq + / PYuMU6fTd800EOcbRX43RgeX6Qrgu + MDdDbte + CwKZL2Q28IZ0Viv +8
YItYXdgwhNnUO2QE7jn/g5KXn4v72QqpnsPJjWQVVD12 + h6DDUdaQHMsTdYyYIVD
Jkc8dPDVTLutVnuK2HZ4wQWRoiIWIMsUzePUht0eWi7DJFOlC5NuwS + E6FuxtgFj
IwJyCr/dLC/u6YtVCAb37UUSu7k3F5iD3hFTt1RyswK7HBDizV1CHIlc2diARfkL
CwRpYc/SlpZNgbAXaUzwHhtIQjCuRXQGsXtvDFke4CvM9nGe6Uk095yVOAKla1Y =
= mVny
----- END PGP SIGNATURE -----
'
这是如何将消息分解为XML:
require 'nokogiri'
xml = message[/(<\? xml .+)----- BEGIN/m, 1]
doc = Nokogiri::XML::DocumentFragment.parse(xml)
doc.at('IP_Address').text # => " 158.01.52.23 "
神奇的部分是:
xml = message[/(<\? xml .+)----- BEGIN/m, 1]
抓取从<? xml
到----- BEGIN
之前的行的所有内容。然后Nokogiri::XML::DocumentFragment.parse
可以创建一个可搜索的DOM。