如何在OS X上grep并替换多个文件和多个元素

时间:2014-03-12 00:46:50

标签: sed grep

我在OS X上使用这个grep命令行。

grep -E 'Title|Amount|AwardID|FirstName|LastName| *.xml,结果在这里:

<Title>ABC System</Title>
<Amount>50000</Amount>
<AwardID>1000</AwardID>
<FirstName>Name</FirstName>
<LastName>Thanks</LastName>

现在,我尝试使用sed替换字符串并完成工作。但它并没有完成任务。

我应该使用哪些选项来获取它。

sed -i "" 's/Title//g'

作为txt文件的结果:

ABC System, 50000, 100, Name, Thanks

更新

我可以单独做。

$ grep -E 'AwardID|AwardAmount|FirstName|LastName' 1433501.xml > test
$ sed -E '/AwardID|AwardAmount|FirstName|LastName/s/.*>([^<]+)<.*/\1/' test

43856 1433501 费萨尔 侯塞因

$ sed -E '/AwardID|AwardAmount|FirstName|LastName/s/.*>([^<]+)<.*/\1/' test | paste -sd',' -

43856,1433501,费萨尔,侯塞因

但是当我把xxx.xml - &gt; * .xml,我需要换行。我该怎么办?

更新

AwardTable

xml sel -t -v //AwardID -o , -v //AwardAmount -nl *.xml > AwardTable.csv

InvestigatorTable

xml sel -t     -v //AwardID  -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b -o [PI]    -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b  -o [CoPI]   -nl *.xml

我应该如何获取InvestigatorTable的数据?我怎样才能有以下格式?

ID, Firstname, Lastname, Role
12345, FirstName, LastName, PI
12345, FirstName, LastName, Co-PI
12345, FirstName, LastName, Former-PI


xml sel -t     -v //AwardID -o , -v //AwardAmount     -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [PI] -b     -m '//Investigator[RoleCode = "Former Principal Investigator"]' -o , -v FirstName -o , -v  LastName -o [FoPI]  -b     -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [CoPI] -b     -nl *.xml

我可以这样做

1417948,93147,M. Lee,Allison[PI],Jennifer,Arrigo[CoPI],Cynthia,Chandler[CoPI],Kerstin,Lehnert[CoPI]
1417966,574209,Robb,Lindgren[PI]
1418062,253000,Julia,Coonrod[PI],Gary,Harrison[FoPI]

我现在可以手动完成,但请帮帮我。

更新

请帮我用结构获得结果

AwardID, FirstName, LastName, Role

2 个答案:

答案 0 :(得分:2)

这是另一种方法:

sed -nE '/Title|Amount|AwardID|FirstName|LastName/s/.*>([^<]+)<.*/\1/p' *.xml | paste -sd',' -

使用您的示例数据,它提供了以下输出:

$ sed -nE '/Title|Amount|AwardID|FirstName|LastName/s/.*>([^<]+)<.*/\1/p' xmlfile | paste -sd',' -
Collaborative Research: Using the Rurutu hotspot to evaluate mantle motion and absolute plate motion models,137715,1433097,Jasper,Konter

答案 1 :(得分:1)

awk会这样做:

awk -v ORS=", " -F '[<>]' '
    /Title|Amount|AwardID|FirstName|LastName/ {print $3} 
    END {printf "\b\b \n"}
' << EOF
<Title>ABC System</Title>
<Amount>50000</Amount>
<AwardID>1000</AwardID>
<FirstName>Name</FirstName>
<LastName>Thanks</LastName>
EOF
ABC System, 50000, 1000, Name, Thanks  

对于多个文件,我假设您需要为每个文件添加换行符。 GNU awk v4有一个扩展名:ENDFILE

gawk -v ORS=", " -F '[<>]' '
    /Title|Amount|AwardID|FirstName|LastName/ {print $3} 
    ENDFILE {printf "\b\b \n"}
' *.xml

否则它会有更多的工作:

awk -v ORS=", " -F '[<>]' '
    /Title|Amount|AwardID|FirstName|LastName/ {print $3} 
    FNR == 1 && FILENAME != ARGV[1] {printf "\b\b \n"}
    END {printf "\b\b \n"}
' *.xml

为了健壮性,您应该使用XML解析器或XSLT转换。


给出你的样本xml文件,这是一个使用xmlstarlet的解决方案,我喜欢的xml处理工具:

xmlstarlet sel -t -v //AwardTitle -o , -v //AwardAmount -o , -v //AwardID -m //Investigator -o , -v FirstName -o , -v LastName -b -nl 1419538.xml 1424234.xml 
IBDR: Workshop on Successful Approaches for Development and Dissemination of Instrumentation for Biological Research - May 1-2, 2014; Rosslyn, VA,49990,1419538,Sameer,Sonkusale,Valencia,Koomson,Eduardo,Rosa-Molinar
RAPID: Role of Physical, Chemical and Diffusion Properties of 4-Methyl-cyclohexane methanol in Remediating Contaminated Water and Water Pipes,49999,1424234,Daniel,Gallagher,Andrea,Dietrich,Paolo,Scardina

如果你想使用另一个XSLT工具,这里是生成的样式表:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="//AwardTitle"/>
    </xsl:call-template>
    <xsl:text>,</xsl:text>
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="//AwardAmount"/>
    </xsl:call-template>
    <xsl:text>,</xsl:text>
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="//AwardID"/>
    </xsl:call-template>
    <xsl:for-each select="//Investigator">
      <xsl:text>,</xsl:text>
      <xsl:call-template name="value-of-template">
        <xsl:with-param name="select" select="FirstName"/>
      </xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="value-of-template">
        <xsl:with-param name="select" select="LastName"/>
      </xsl:call-template>
    </xsl:for-each>
    <xsl:value-of select="'&#10;'"/>
  </xsl:template>
  <xsl:template name="value-of-template">
    <xsl:param name="select"/>
    <xsl:value-of select="$select"/>
    <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
      <xsl:value-of select="'&#10;'"/>
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

架构不是很好。具体来说,它不灵活:如果有超过5名调查人员怎么办?你需要这样的东西:

也许更简单:

  

奖励表:身份证,头衔,金额
  AwardInvestigators表:award_id,名字,姓氏,角色


顺便说一句,我更仔细地阅读了这个问题。我已经通过xmlstarlet命令进行了一些修改,以确保Principal Investigator的名字是第一个:

xmlstarlet sel -t \
    -v //AwardID -o , -v //AwardAmount \
    -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b \
    -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b \
    -nl \
*.xml