我需要编辑一个包含数千个节的kml文件,如下所示。我可以绕过逻辑,但实际的实现超出了我的范围。
程序上我需要:
我觉得我应该能够用bash脚本和一些适度彻底的sed和awk命令来做到这一点但是我开始筑巢所有的陨石坑。
<Placemark>
<name>THIS LINE NEEDS TO BE ADDED FROM THE Sub_Name LINE</name>
<Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData><SchemaData schemaUrl="#gmaps">
<SimpleData name="EntID">1274433</SimpleData>
<SimpleData name="Sub_Name">HYDE PARK</SimpleData>
<SimpleData name="ORIG_FID">39</SimpleData>
<SimpleData name="Scode">S5435</SimpleData>
<SimpleData name="Shape_Leng">1653.15682579000</SimpleData>
<SimpleData name="Shape_Area">13612381.56865700000</SimpleData>
</SchemaData></ExtendedData>
<MultiGeometry><Polygon><altitudeMode>clampToGround</altitudeMode><outerBoundaryIs><LinearRing><altitudeMode>clampToGround</altitudeMode><coordinates>-97.7740412096895,30.4376501989282</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>
这与this问题非常相似,但我已经解析了一个小时但无法使其符合我的情况。
感谢您提供的任何建议和指导。
答案 0 :(得分:3)
简单的方法就是两次通过:
$ cat tst.awk
NR==FNR {
if ( /Sub_Name/ ) {
gsub(/[[:space:]]*<[^<>]+>/,"")
names[NR-4] = ORS "<name>" $0 "</name>"
}
next
}
{ print $0 names[FNR] }
$ awk -f tst.awk file file
<Placemark>
<name>HYDE PARK</name>
<Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData><SchemaData schemaUrl="#gmaps">
<SimpleData name="EntID">1274433</SimpleData>
<SimpleData name="Sub_Name">HYDE PARK</SimpleData>
<SimpleData name="ORIG_FID">39</SimpleData>
<SimpleData name="Scode">S5435</SimpleData>
<SimpleData name="Shape_Leng">1653.15682579000</SimpleData>
<SimpleData name="Shape_Area">13612381.56865700000</SimpleData>
</SchemaData></ExtendedData>
<MultiGeometry><Polygon><altitudeMode>clampToGround</altitudeMode><outerBoundaryIs><LinearRing><altitudeMode>clampToGround</altitudeMode><coordinates>-97.7740412096895,30.4376501989282</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>
以上内容来自此输入文件:
$ cat file
<Placemark>
<Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData><SchemaData schemaUrl="#gmaps">
<SimpleData name="EntID">1274433</SimpleData>
<SimpleData name="Sub_Name">HYDE PARK</SimpleData>
<SimpleData name="ORIG_FID">39</SimpleData>
<SimpleData name="Scode">S5435</SimpleData>
<SimpleData name="Shape_Leng">1653.15682579000</SimpleData>
<SimpleData name="Shape_Area">13612381.56865700000</SimpleData>
</SchemaData></ExtendedData>
<MultiGeometry><Polygon><altitudeMode>clampToGround</altitudeMode><outerBoundaryIs><LinearRing><altitudeMode>clampToGround</altitudeMode><coordinates>-97.7740412096895,30.4376501989282</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>
稍微困难的方法是保持4行的滚动缓冲区并始终打印第4行读取,但只有当您的输入来自管道或您的文件太大时才需要#&# 39; t得到时间解析它两次或记忆存储所有&#34; name&#34;数组中的行。
关于在没有HTML解析器的情况下尝试解析HTML的危险的常见警告适用...
答案 1 :(得分:0)
假设:
$ cat xml_file
<Placemark>
<Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData><SchemaData schemaUrl="#gmaps">
<SimpleData name="EntID">1274433</SimpleData>
<SimpleData name="Sub_Name">HYDE PARK</SimpleData>
<SimpleData name="ORIG_FID">39</SimpleData>
<SimpleData name="Scode">S5435</SimpleData>
<SimpleData name="Shape_Leng">1653.15682579000</SimpleData>
<SimpleData name="Shape_Area">13612381.56865700000</SimpleData>
</SchemaData></ExtendedData>
<MultiGeometry><Polygon><altitudeMode>clampToGround</altitudeMode><outerBoundaryIs><LinearRing><altitudeMode>clampToGround</altitudeMode><coordinates>-97.7740412096895,30.4376501989282</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>
</Placemark>
如果您想要解析该XML并使用xpath来查找嵌套子节点的值并添加另一个节点,您可以沿着这些方向做一些事情(例如Ruby) :
$ ruby -r nokogiri -e 'doc=Nokogiri::XML($<.read) # {|opt| opt.strict.noblanks }
t1=doc.at_css "Placemark"
t2 = Nokogiri::XML::Node.new "name", doc
t2.parent=t1
t2.content=doc.xpath("//SimpleData[@name=\"Sub_Name\"]").text
puts doc
' xml_file
打印:
<?xml version="1.0"?>
<Placemark>
<Style><LineStyle><color>ff0000ff</color></LineStyle><PolyStyle><fill>0</fill></PolyStyle></Style>
<ExtendedData><SchemaData schemaUrl="#gmaps">
<SimpleData name="EntID">1274433</SimpleData>
<SimpleData name="Sub_Name">HYDE PARK</SimpleData>
<SimpleData name="ORIG_FID">39</SimpleData>
<SimpleData name="Scode">S5435</SimpleData>
<SimpleData name="Shape_Leng">1653.15682579000</SimpleData>
<SimpleData name="Shape_Area">13612381.56865700000</SimpleData>
</SchemaData></ExtendedData>
<MultiGeometry><Polygon><altitudeMode>clampToGround</altitudeMode><outerBoundaryIs><LinearRing><altitudeMode>clampToGround</altitudeMode><coordinates>-97.7740412096895,30.4376501989282</coordinates></LinearRing></outerBoundaryIs></Polygon></MultiGeometry>
<name>HYDE PARK</name></Placemark>
(请注意,插入的节点<name>HYDE PARK</name>
位于<Placemark>
节点的末尾,因为架构未指定XML顺序。)
使用XML解析器的任何其他脚本语言都是类似的(Ruby,Python,Perl,jq等)