我有一堆LTE CDR,当解码的外观和感觉就像XML一样,但不是(我不确定确切的差异,但它是分层的,类似于XML)。我复制了以下其中一行。每个文件中有50或60个条目。
我的目标是搜索匹配的条目和IP地址(下面的HEX)和时间范围。然后将IMSI与它相关联。这些字段在下面。
字段我正在搜索:
...
<servedIMSI>13 91 03 00 00 00 10 F8</servedIMSI>
...
<servedPDPAddress>
<iPAddress>
<iPBinaryAddress>
<iPBinV4Address>0A 37 00 11</iPBinV4Address>
</iPBinaryAddress>
</iPAddress>
</servedPDPAddress>
...
<timeOfFirstUsage>14 02 04 04 09 40 2D 06 00</timeOfFirstUsage>
<timeOfLastUsage>14 02 04 04 12 44 2D 06 00</timeOfLastUsage>
...
我尝试使用XML工具,但由于这不是XML,因此无法使用。
我想知道是否有更好的方法来搜索和检索我想要的数据。我可以使用正则表达式来查找数据,但XML方法似乎是一种更好的方法(即使这不是XML)。我对任何想法都持开放态度!
CDR的片段:
<GPRSRecord>
<egsnPDPRecord>
<recordType>70</recordType>
<servedIMSI>13 91 03 00 00 00 10 F8</servedIMSI>
<ggsnAddress>
<iPBinaryAddress>
<iPBinV4Address>AB CD 72 62</iPBinV4Address>
</iPBinaryAddress>
</ggsnAddress>
<chargingID>126400647</chargingID>
<sgsnAddress>
<iPBinaryAddress>
<iPBinV4Address>AB CD 72 62</iPBinV4Address>
</iPBinaryAddress>
</sgsnAddress>
<accessPointNameNI><bs/>Internet<si/>syringawireless<etx/>com</accessPointNameNI>
<pdpType>01 21</pdpType>
<servedPDPAddress>
<iPAddress>
<iPBinaryAddress>
<iPBinV4Address>0A 37 00 11</iPBinV4Address>
</iPBinaryAddress>
</iPAddress>
</servedPDPAddress>
<dynamicAddressFlag><true/></dynamicAddressFlag>
<listOfTrafficVolumes>
<ChangeOfCharCondition>
<dataVolumeGPRSUplink>192323</dataVolumeGPRSUplink>
<dataVolumeGPRSDownlink>320043</dataVolumeGPRSDownlink>
<changeCondition><recordClosure/></changeCondition>
<changeTime>14 02 04 04 12 46 2D 06 00</changeTime>
<userLocationInformation>01 13 01 39 01 86 BD 01</userLocationInformation>
</ChangeOfCharCondition>
</listOfTrafficVolumes>
<recordOpeningTime>14 02 04 04 09 40 2D 06 00</recordOpeningTime>
<duration>186</duration>
<causeForRecClosing>16</causeForRecClosing>
<recordSequenceNumber>26784</recordSequenceNumber>
<nodeID>1</nodeID>
<localSequenceNumber>8858562</localSequenceNumber>
<apnSelectionMode><mSorNetworkProvidedSubscriptionVerified/></apnSelectionMode>
<servedMSISDN>91 02 98 99 00 81</servedMSISDN>
<chargingCharacteristics>01 00</chargingCharacteristics>
<chChSelectionMode><sGSNSupplied/></chChSelectionMode>
<sgsnPLMNIdentifier>13 01 39</sgsnPLMNIdentifier>
<servedIMEISV>53 97 04 40 81 57 80 00</servedIMEISV>
<rATType>6</rATType>
<userLocationInformation>01 13 01 39 01 86 BD 01</userLocationInformation>
<listOfServiceData>
<ChangeOfServiceCondition>
<ratingGroup>1</ratingGroup>
<localSequenceNumber>1</localSequenceNumber>
<timeOfFirstUsage>14 02 04 04 09 40 2D 06 00</timeOfFirstUsage>
<timeOfLastUsage>14 02 04 04 12 44 2D 06 00</timeOfLastUsage>
<serviceConditionChange>
00000000000000000000000010000000
</serviceConditionChange>
<sgsn-Address>
<iPBinaryAddress>
<iPBinV4Address>AB CD 72 62</iPBinV4Address>
</iPBinaryAddress>
</sgsn-Address>
<sGSNPLMNIdentifier>13 01 39</sGSNPLMNIdentifier>
<datavolumeFBCUplink>192323</datavolumeFBCUplink>
<datavolumeFBCDownlink>320043</datavolumeFBCDownlink>
<timeOfReport>14 02 04 04 12 46 2D 06 00</timeOfReport>
<rATType>6</rATType>
<userLocationInformation>01 13 01 39 01 86 BD 01</userLocationInformation>
</ChangeOfServiceCondition>
</listOfServiceData>
</egsnPDPRecord>
</GPRSRecord>
答案 0 :(得分:4)
存在XML解析器来解析格式良好的XML。如果您的XML 不格式正确,它们通常会失败 - 通常是混乱的。
但您的XML似乎格式正确。所以我个人最喜欢使用XML::Twig
作为个人喜爱。
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
sub extractIMSI {
my ( $twig, $servedIMSI ) = @_;
print $servedIMSI -> text(),"\n";
$twig -> purge(); #why I like XML::Twig - it lets you clear memory on the fly
}
my $parser = XML::Twig -> new ( twig_handlers => { 'servedIMSI' => \&extractIMSI } );
$parser -> parsefile ( 'test.xml' );
无论如何,如果'test.xml'包含您的样本数据,则无效。
答案 1 :(得分:3)
这个简短的Perl程序处理一个名为GPRSRecord.xml
的文件,其中包含您在问题中显示的数据,包含在<root>...</root>
元素中。它从它找到的每个egsnPDPRecord
元素中提取您说您感兴趣的字段。显然,在这种情况下,只有一个。
use strict;
use warnings;
use XML::LibXML;
my $xml = XML::LibXML->load_xml(location => 'GPRSRecord.xml');
for my $pdp_rec ($xml->findnodes('/root/GPRSRecord/egsnPDPRecord')) {
my ($imsi_address) = $pdp_rec->findnodes('servedIMSI');
printf "%s: %s\n", $imsi_address->nodeName, $imsi_address->textContent;
my ($ip_v4_address) = $pdp_rec->findnodes('servedPDPAddress/iPAddress/iPBinaryAddress/iPBinV4Address');
printf "%s: %s\n", $ip_v4_address->nodeName, $ip_v4_address->textContent;
my ($service_condition) = $pdp_rec->findnodes('listOfServiceData/ChangeOfServiceCondition');
my ($first_usage) = $service_condition->findnodes('timeOfFirstUsage');
my ($last_usage) = $service_condition->findnodes('timeOfLastUsage');
printf "%s: %s\n", $first_usage->nodeName, $first_usage->textContent;
printf "%s: %s\n", $last_usage->nodeName, $last_usage->textContent;
}
<强>输出强>
servedIMSI: 13 91 03 00 00 00 10 F8
iPBinV4Address: 0A 37 00 11
timeOfFirstUsage: 14 02 04 04 09 40 2D 06 00
timeOfLastUsage: 14 02 04 04 12 44 2D 06 00
答案 2 :(得分:1)
Perl中的有状态循环可以很容易地工作,但需要注意的是,XML解析器为处理多行条目等所做的大部分工作都需要在这里复制,以适应任何与之不匹配的文件。示例文本。像
这样的东西my $infile;
open($infile, "MyCDRFile.nxm");
my %searches = {
"rec_start" => "egsnPDPRecord",
"imsi" => "servedIMSI",
"ip" => "iPBinV4Address",
"firsttime" => "timeOfFirstUsage",
"lasttime" => "timeOfLastUsage"
};
my %finds;
my ($imsi,) = ("");
while (my $line = <$infile>) {
chomp($line);
if (index($line, $searches{"rec_start"}) > -1) {
if ($imsi ne "") print "[$imsi, " + join(',', @finds{"ip", "firsttime", "lasttime"}) + "]\n";
$imsi = "";
}
if (index($line, $searches{"imsi"}) > -1) {
$imsi = (split($line, $searches{"imsi"}))[1];
$imsi =~ s![<>/]!!g;
}
foreach my $search ("ip", "firsttime", "lasttime") {
if ($imsi ne "" and index($line, $searches{$search}) > -1) {
$finds{$search} = (split($line, $searches{$search}))[1];
$finds{$search} =~ s![<>/]!!g;
}
}
}
close($infile);
打印到单独的文件,从STDIN
读取等都可以相当容易地添加到此文件中。