使用grep / regex

时间:2016-01-12 19:10:11

标签: regex grep

我将.xml文件中<item></item>之间的所有数据放在一行上(即用\ t替换每个\ r)。所以不要这样:

    <item>
        <title>Image</title>
        <link>http://www.somewebsite.com/?post_type=acf-field&#038;p=23</link>
        <pubDate>Thu, 06 Aug 2015 15:19:17 +0000</pubDate>
        <dc:creator><![CDATA[joey]]></dc:creator>
        <guid isPermaLink="false">https://www.somewebsite.com/?post_type=acf-field&#038;p=23</guid>
        <description></description>
        <content:encoded><![CDATA[a:16:{s:4:"type";s:5:"image";s:12:"instructions";s:48:"Image size should be 1110px wide by 1154px tall.";s:8:"required";i:0;s:17:"conditional_logic";i:0;s:7:"wrapper";a:3:{s:5:"width";s:0:"";s:5:"class";s:0:"";s:2:"id";s:0:"";}s:13:"parent_layout";s:13:"55c37498bba7e";s:13:"return_format";s:2:"id";s:12:"preview_size";s:6:"medium";s:7:"library";s:3:"all";s:9:"min_width";s:0:"";s:10:"min_height";s:0:"";s:8:"min_size";s:0:"";s:9:"max_width";s:0:"";s:10:"max_height";s:0:"";s:8:"max_size";s:0:"";s:10:"mime_types";s:0:"";}]]></content:encoded>
        <excerpt:encoded><![CDATA[image]]></excerpt:encoded>
        <wp:post_id>23</wp:post_id>
        <wp:post_date>2015-08-06 15:19:17</wp:post_date>
        <wp:post_date_gmt>2015-08-06 15:19:17</wp:post_date_gmt>
        <wp:comment_status>open</wp:comment_status>
        <wp:ping_status>open</wp:ping_status>
        <wp:post_name>field_55c374e5aecef</wp:post_name>
        <wp:status>publish</wp:status>
        <wp:post_parent>22</wp:post_parent>
        <wp:menu_order>0</wp:menu_order>
        <wp:post_type>acf-field</wp:post_type>
        <wp:post_password></wp:post_password>
        <wp:is_sticky>0</wp:is_sticky>
    </item>

我想这样:

<item>  <title>Image</title>    <link>http://www.somewebsite.com/?post_type=acf-field&#038;p=23</link>  <pubDate>Thu, 06 Aug 2015 15:19:17 +0000</pubDate>  <dc:creator><![CDATA[joey]]></dc:creator>   <guid isPermaLink="false">https://www.somewebsite.com/?post_type=acf-field&#038;p=23</guid> <description></description> <content:encoded><![CDATA[a:16:{s:4:"type";s:5:"image";s:12:"instructions";s:48:"Image size should be 1110px wide by 1154px tall.";s:8:"required";i:0;s:17:"conditional_logic";i:0;s:7:"wrapper";a:3:{s:5:"width";s:0:"";s:5:"class";s:0:"";s:2:"id";s:0:"";}s:13:"parent_layout";s:13:"55c37498bba7e";s:13:"return_format";s:2:"id";s:12:"preview_size";s:6:"medium";s:7:"library";s:3:"all";s:9:"min_width";s:0:"";s:10:"min_height";s:0:"";s:8:"min_size";s:0:"";s:9:"max_width";s:0:"";s:10:"max_height";s:0:"";s:8:"max_size";s:0:"";s:10:"mime_types";s:0:"";}]]></content:encoded>   <excerpt:encoded><![CDATA[image]]></excerpt:encoded>    <wp:post_id>23</wp:post_id> <wp:post_date>2015-08-06 15:19:17</wp:post_date>    <wp:post_date_gmt>2015-08-06 15:19:17</wp:post_date_gmt>    <wp:comment_status>open</wp:comment_status> <wp:ping_status>open</wp:ping_status>   <wp:post_name>field_55c374e5aecef</wp:post_name>    <wp:status>publish</wp:status>  <wp:post_parent>22</wp:post_parent> <wp:menu_order>0</wp:menu_order>    <wp:post_type>acf-field</wp:post_type>  <wp:post_password></wp:post_password>   <wp:is_sticky>0</wp:is_sticky>  </item>

这个post让我很接近,但我仍在学习正则表达式命令而无法解决这个问题。

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

如果项目节点没有嵌套,意味着item节点不包含其他sed节点,则可以使用此sed '/<item>/{:a;N;/<\/item>/!ba;s/\n/\t/g;}' file.xml 命令:

<item>

当文件中出现模式{ ... }时,:a之间的块将被执行。

N定义标签,/<\/item>/读取下一行并将其附加到模式缓冲区。 </item>检查模式缓冲区中的结束!节点。如果找不到(ba),a会回到标签<item>

一旦达到结束s/\n/\t/g标记,{{1}}将通过标签替换模式缓冲区中的所有换行符。