我在OS X上使用这个grep
命令行。
grep -E 'Title|Amount|AwardID|FirstName|LastName| *.xml
,结果在这里:
<Title>ABC System</Title>
<Amount>50000</Amount>
<AwardID>1000</AwardID>
<FirstName>Name</FirstName>
<LastName>Thanks</LastName>
现在,我尝试使用sed
替换字符串并完成工作。但它并没有完成任务。
我应该使用哪些选项来获取它。
sed -i "" 's/Title//g'
作为txt文件的结果:
ABC System, 50000, 100, Name, Thanks
我可以单独做。
$ grep -E 'AwardID|AwardAmount|FirstName|LastName' 1433501.xml > test
$ sed -E '/AwardID|AwardAmount|FirstName|LastName/s/.*>([^<]+)<.*/\1/' test
43856 1433501 费萨尔 侯塞因
$ sed -E '/AwardID|AwardAmount|FirstName|LastName/s/.*>([^<]+)<.*/\1/' test | paste -sd',' -
43856,1433501,费萨尔,侯塞因
但是当我把xxx.xml - &gt; * .xml,我需要换行。我该怎么办?
AwardTable
xml sel -t -v //AwardID -o , -v //AwardAmount -nl *.xml > AwardTable.csv
InvestigatorTable
xml sel -t -v //AwardID -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName -b -o [PI] -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName -b -o [CoPI] -nl *.xml
我应该如何获取InvestigatorTable的数据?我怎样才能有以下格式?
ID, Firstname, Lastname, Role
12345, FirstName, LastName, PI
12345, FirstName, LastName, Co-PI
12345, FirstName, LastName, Former-PI
xml sel -t -v //AwardID -o , -v //AwardAmount -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [PI] -b -m '//Investigator[RoleCode = "Former Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [FoPI] -b -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [CoPI] -b -nl *.xml
我可以这样做
1417948,93147,M. Lee,Allison[PI],Jennifer,Arrigo[CoPI],Cynthia,Chandler[CoPI],Kerstin,Lehnert[CoPI]
1417966,574209,Robb,Lindgren[PI]
1418062,253000,Julia,Coonrod[PI],Gary,Harrison[FoPI]
我现在可以手动完成,但请帮帮我。
请帮我用结构获得结果
AwardID, FirstName, LastName, Role
答案 0 :(得分:2)
这是另一种方法:
sed -nE '/Title|Amount|AwardID|FirstName|LastName/s/.*>([^<]+)<.*/\1/p' *.xml | paste -sd',' -
使用您的示例数据,它提供了以下输出:
$ sed -nE '/Title|Amount|AwardID|FirstName|LastName/s/.*>([^<]+)<.*/\1/p' xmlfile | paste -sd',' -
Collaborative Research: Using the Rurutu hotspot to evaluate mantle motion and absolute plate motion models,137715,1433097,Jasper,Konter
答案 1 :(得分:1)
awk会这样做:
awk -v ORS=", " -F '[<>]' '
/Title|Amount|AwardID|FirstName|LastName/ {print $3}
END {printf "\b\b \n"}
' << EOF
<Title>ABC System</Title>
<Amount>50000</Amount>
<AwardID>1000</AwardID>
<FirstName>Name</FirstName>
<LastName>Thanks</LastName>
EOF
ABC System, 50000, 1000, Name, Thanks
对于多个文件,我假设您需要为每个文件添加换行符。 GNU awk v4有一个扩展名:ENDFILE
gawk -v ORS=", " -F '[<>]' '
/Title|Amount|AwardID|FirstName|LastName/ {print $3}
ENDFILE {printf "\b\b \n"}
' *.xml
否则它会有更多的工作:
awk -v ORS=", " -F '[<>]' '
/Title|Amount|AwardID|FirstName|LastName/ {print $3}
FNR == 1 && FILENAME != ARGV[1] {printf "\b\b \n"}
END {printf "\b\b \n"}
' *.xml
为了健壮性,您应该使用XML解析器或XSLT转换。
给出你的样本xml文件,这是一个使用xmlstarlet的解决方案,我喜欢的xml处理工具:
xmlstarlet sel -t -v //AwardTitle -o , -v //AwardAmount -o , -v //AwardID -m //Investigator -o , -v FirstName -o , -v LastName -b -nl 1419538.xml 1424234.xml
IBDR: Workshop on Successful Approaches for Development and Dissemination of Instrumentation for Biological Research - May 1-2, 2014; Rosslyn, VA,49990,1419538,Sameer,Sonkusale,Valencia,Koomson,Eduardo,Rosa-Molinar
RAPID: Role of Physical, Chemical and Diffusion Properties of 4-Methyl-cyclohexane methanol in Remediating Contaminated Water and Water Pipes,49999,1424234,Daniel,Gallagher,Andrea,Dietrich,Paolo,Scardina
如果你想使用另一个XSLT工具,这里是生成的样式表:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="//AwardTitle"/>
</xsl:call-template>
<xsl:text>,</xsl:text>
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="//AwardAmount"/>
</xsl:call-template>
<xsl:text>,</xsl:text>
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="//AwardID"/>
</xsl:call-template>
<xsl:for-each select="//Investigator">
<xsl:text>,</xsl:text>
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="FirstName"/>
</xsl:call-template>
<xsl:text>,</xsl:text>
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="LastName"/>
</xsl:call-template>
</xsl:for-each>
<xsl:value-of select="' '"/>
</xsl:template>
<xsl:template name="value-of-template">
<xsl:param name="select"/>
<xsl:value-of select="$select"/>
<xsl:for-each select="exslt:node-set($select)[position()>1]">
<xsl:value-of select="' '"/>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
架构不是很好。具体来说,它不灵活:如果有超过5名调查人员怎么办?你需要这样的东西:
也许更简单:
奖励表:身份证,头衔,金额
AwardInvestigators表:award_id,名字,姓氏,角色
xmlstarlet sel -t \
-v //AwardID -o , -v //AwardAmount \
-m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName -b \
-m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName -b \
-nl \
*.xml