想知道是否可以制作一个简单的脚本来检查是否符合多个条件并对文件进行必要的修改。
继续举例说明我拥有的和我想要实现的目标。
我有一个包含4行的xml文件 - 数字,年份,型号和人。
如果<man>
是福特或道奇,我希望不做任何修改。但如果<man>
不是其他内容,那么我想检查<year>
或<model>
是否为“NA”并删除“NA”行。
<?xml version="1.0" encoding="UTF-8"?>
<CarStuff>
<fileName>CarExpor201217.xml</fileName>
<numberCars>5</numberCars>
<ref>2017XY</ref>
<carExo id="CAR0001_01">
<dealVen id="CAR0001_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0001_03">
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>NA</year> - Line must be removed
<model>NA</model> - Line must be removed
<man>Acura</man>
</soldCar>
</carExo>
<carExo id="CAR0002_01">
<dealVen id="CAR0002_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0002_03">
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>NA</year> - Line must be kept
<model>NA</model> - Line must be kept
<man>Ford</man>
</soldCar>
</carExo>
<carExo id="CAR0003_01">
<dealVen id="CAR0003_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0003_03">
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>1997</year> - Line must be kept
<model>NA</model> - Line must be removed
<man>Bugati</man>
</soldCar>
</carExo>
<carExo id="CAR0004_01">
<dealVen id="CAR0004_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0004_03">
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>1997</year> - Line must be kept
<model>NA</model> - Line must be kept
<man>Dodge</man>
</soldCar>
</carExo>
<carExo id="CAR0005_01">
<dealVen id="CAR0005_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0005_03">
<amount>1811.10</amount>
<lotNumber>2</lotNumber>
<year>NA</year> - Line must be kept
<model>Charger</model> - Line must be kept
<man>Dodge</man>
</soldCar>
</carExo>
<carExo id="CAR0005_01">
<dealVen id="CAR0005_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0005_03">
<amount>1811.10</amount>
<lotNumber>3</lotNumber>
<year>NA</year> - Line must be removed
<model>Dot</model> - Line must be kept
<man>Datsun</man>
</soldCar>
</carExo>
</CarStuff>
感谢所有评论和想法。
答案 0 :(得分:1)
只需使用XSLT,这种专用语言旨在通过根据各种标准删除节点来完全转换原始XML文件。
具体到下面运行Identity Transform按原样复制XML,然后按照您的模型/年/人的标准排除节点。
XSLT (另存为.xsl,一个特殊的.xml文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="soldCar[man != 'Ford' and man != 'Dodge']">
<xsl:copy>
<xsl:copy-of select="amount|lotNumber"/>
<xsl:if test="model != 'NA'">
<xsl:copy-of select="model"/>
</xsl:if>
<xsl:if test="year != 'NA'">
<xsl:copy-of select="year"/>
</xsl:if>
<xsl:copy-of select="man"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<强> VBA 强>
Public Sub RunXSLT()
Dim strFile As String, strPath As String
' REFERENCE MS XML, v6.0
Dim xmlDoc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60
Dim newDoc As New MSXML2.DOMDocument60
' LOAD XML SOURCE
xmlDoc.Load "C:\Path\To\Input.xml"
' LOAD XSL SOURCE
xslDoc.Load "C:\Path\To\XSLT\Script.xsl"
' TRANSFORM SOURCE
xmlDoc.transformNodeToObject xslDoc, newDoc
newDoc.Save "C:\Path\To\Output.xml"
' RELEASE DOM OBJECTS
Set xmlDoc = Nothing: Set xslDoc = Nothing: Set newDoc = Nothing
End Sub
<强>输出强>
<?xml version="1.0" encoding="utf-8"?>
<CarStuff>
<fileName>CarExpor201217.xml</fileName>
<numberCars>5</numberCars>
<ref>2017XY</ref>
<carExo id="CAR0001_01">
<dealVen id="CAR0001_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar>
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<man>Acura</man>
</soldCar>
</carExo>
<carExo id="CAR0002_01">
<dealVen id="CAR0002_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0002_03">
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>NA</year>
<model>NA</model>
<man>Ford</man>
</soldCar>
</carExo>
<carExo id="CAR0003_01">
<dealVen id="CAR0003_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar>
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>1997</year>
<man>Bugati</man>
</soldCar>
</carExo>
<carExo id="CAR0004_01">
<dealVen id="CAR0004_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0004_03">
<amount>1811.10</amount>
<lotNumber>1</lotNumber>
<year>1997</year>
<model>NA</model>
<man>Dodge</man>
</soldCar>
</carExo>
<carExo id="CAR0005_01">
<dealVen id="CAR0005_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar id="CAR0005_03">
<amount>1811.10</amount>
<lotNumber>2</lotNumber>
<year>NA</year>
<model>Charger</model>
<man>Dodge</man>
</soldCar>
</carExo>
<carExo id="CAR0005_01">
<dealVen id="CAR0005_02">
<name>John</name>
<surname>Smith</surname>
</dealVen>
<soldCar>
<amount>1811.10</amount>
<lotNumber>3</lotNumber>
<model>Dot</model>
<man>Datsun</man>
</soldCar>
</carExo>
</CarStuff>
答案 1 :(得分:0)
听起来您需要删除包含>NA<
的所有行。
这不是一个编程问题(所以它是off-topic),但这是使用Notepad++的快速回答:
Ctrl + H 以显示查找替换对话框。
在Find what:
文本框中包含您的正则表达式:.*>NA<.*\r?\n
(如果文件没有Windows行结尾,\r
是可选的)< / EM>
将Replace with:
文本框保留为空。
确保选中搜索模式区域中的Regular Expression
单选按钮。
舔Replace All
并vo!包含>NA<
的所有行都已删除。
(答案改编自this)。
答案 2 :(得分:0)
通过XMLDom解决方案
您可以使用XMLDom和XPath在所谓的NodeList中搜索不包含Dodge或Ford字符串的<man>
标记,并检查所有兄弟节点是否包含“NA”以便删除它们。下面的代码使用后期绑定。顺便说一句,你的OP中的xml格式不正确(结束标记</carStuf>
而不是</carStuff>
- 我在加载时添加了一个小的解析错误例程来检查它。
<强>代码强>
Option Explicit
Sub checkNA()
Dim xDoc As Object ' xml document
Dim noli, noli2 As Object ' node list
Dim no, no2 As Object ' node
Dim noMan As Object ' node <man> to check if no Dodge or Ford
Dim s As String
Dim sFile As String ' xml file name
sFile = ThisWorkbook.Path & "\xml\na_test.xml" ' <<< change to your xml file name
' late binding xml
Set xDoc = CreateObject("MSXML2.DOMDocument.6.0")
xDoc.async = False: xDoc.validateOnParse = False
xDoc.setProperty "SelectionLanguage", "XPath"
' load xml
If xDoc.Load(sFile) Then
Debug.Print "Loaded successfully"
Else
Dim xPE As Object ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
Dim strErrText As String
Set xPE = xDoc.parseError
With xPE
strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
xPE.reason & _
"Source Text: " & .srcText & vbCrLf & vbCrLf & _
"Line No.: " & .Line & vbCrLf & _
"Line Pos.: " & .linepos & vbCrLf & _
"File Pos.: " & .filepos & vbCrLf & vbCrLf
End With
MsgBox strErrText, vbExclamation
Set xPE = Nothing
Exit Sub
End If
' check items
s = "carExo/soldCar"
Set noli = xDoc.DocumentElement.SelectNodes(s)
For Each no In noli
Set noMan = no.SelectSingleNode("man")
If Not noMan Is Nothing Then
If InStr("Ford.Dodge" & ".", noMan.Text & ".") = 0 Then
Debug.Print "delete", noMan.Text
' delete all subtags containing "NA" as text
Set noli2 = no.SelectNodes("*")
For Each no2 In noli2
If no2.Text = "NA" Then
' delete item
Debug.Print , no2.nodename & "=" & no2.Text
no2.ParentNode.RemoveChild no2
End If
Next no2
Else
' Debug.Print "keep", noman.Text
End If
End If
Next no
' save
' Debug.Print xDoc.XML
xDoc.Save sFile
' close
Set xDoc = Nothing
End Sub
编辑12/29 - 附录
我使用一些额外的XPath添加了' check items
部分的第二个可行版本。这种替代方法简单地避免了普通代码中的两个If
条件,因为它缩小了两个节点列表中找到的节点的范围。
' check items
s = "carExo/soldCar[man!='Ford'][man!='Dodge']" ' << (1) added condition to XPath
Set noli = xDoc.DocumentElement.SelectNodes(s)
For Each no In noli
Set noMan = no.SelectSingleNode("man")
If Not noMan Is Nothing Then
Debug.Print "delete", noMan.Text
' delete all subtags containing "NA" as text
Set noli2 = no.SelectNodes("*[.='NA']") ' << (2)added condition to XPath
For Each no2 In noli2
' delete item
Debug.Print , no2.nodename & "=" & no2.Text
no2.ParentNode.RemoveChild no2
Next no2
End If
Next no
<强>提示强>
当然有许多通往罗马的街道,请参阅下面的@Parfait的XSLT方法。