Question

我有一个日志文件需要解析才能获得三个值： RSSUrl，RSSCategory和Url val，我可以单独获取这些值中的每一个但我无法弄清楚如何将所有三个值组合在一起，所以我有每个值的上下文。

以下是该文件的基本格式：

    <key id="1" goodness="0" softCached="false" hits="0" creationMillis="1327941760709"       creationMillisAgo="-978" lastHitMillisAgo="INF" size="0" numRows="30" cache_type="L2" limit="1" type="data">
    <filters>
        <filter attr="Community/RSSCategory" value="Jeep"/>
            <filter attr="Community/RSSUrl" value="http://blogs.int.automotive.com/getrequest.php?url=http://blogs.automotive.com/"/>
        <filter attr="Community/NamespaceLookupCommunity"/>
        <filter attr="Krang/NamespaceLookupKrang"/>
    </filters>
    <params>
        <param name="CacheLifeSeconds" value="300"/>
        <param name="LIMIT" value="1"/>
        <param name="ReturnColumns" value="Title,Url,PublishDate,Description,ImageUrl"/>
        <param name="START" value="0"/>
    </params>
    <returns>
        <return attr="Community/RSSResult"/>
    </returns>
    <orders>
        <order attr="Krang/PublishDate" type="DESC"/>
    </orders>
    <keyString>
        [[data,filters=[Community/RSSUrl,Community/NamespaceLookupCommunity,Krang/NamespaceLookupKrang],params=[LIMIT,START],return=[Community/RSSResult],order=[Krang/PublishDate-]],start=0,limit=1]
    </keyString>
</key>
<keyend id="1" nowMillis="1327941760713" queryTimeNanos="115132">
<cached type="L1"/><CallContext>    <ServerName val="WEB-059" />
    <ServerId val="ȯ" />
    <PageName val="Default+%2F+Default" />
    <ClientIp val="10.1.12.111" />
    <Url val="http%3A%2F%2Fwww.automobilemag.com%2Findex.html" />
</CallContext></keyend>

我试过这个grep -E '<filter attr=' rssurl.txt |grep -E '<Url val' rssurl.txt

但它并没有把所有东西都重新组合在一起。有什么想法吗？

Answer 1

grep -E '\<filter attr\=\"Community\/RSSUrl|\<filter attr\=\"Community\/RSSCategory|\<Url val' a

Answer 2

请注意regular expressions are not good at parsing XML。改为使用XML解析器。

使用grep解析日志文件

2 个答案: