根据条件解析和编辑XML文件

时间:2012-11-29 16:23:57

标签: java xml parsing editing xerces

对于我目前正在处理的小项目,我需要解析大量“设置”文件并在特定条件下更改值。我相信这些文件是用XML格式化的,但是我开始怀疑这一点,因为我使用Apache的Xerces XML解析器来解析java中的文件并且得到了奇怪的结果......

以下是我需要解析的示例“设置”文件(我预先为格式化预先配置):

<LIBRARY_ITEM><NAME> AUsten(REdsockswhiteandblack), Kevin(Greysocks),
Johnny(REdandwhitesocksTall), David(REdandwhitesocksshort)(10)-Camera 1.avi
</NAME><ID>{DA656F16-CDDE-49C5-83B6-865DFB58356A}</ID><VERSION
subversion="1">2.0</VERSION><THUMBNAIL_INDEX>0</THUMBNAIL_INDEX><CATEGORIES>
<CATEGORY name="Skaters" id="{90a42bf0-58ec-46d4-8a54-8bbf7844d63a}">Johnny
Yaremko</CATEGORY><CATEGORY name="Skills" id="{165e7d21-aa8f-4376-b38b-
6fe20680e0d4}">Drop &amp; Go</CATEGORY></CATEGORIES><CAMERA_NAME_NODE
CAMERA_NAME="Camera 1"/><TYPE>1</TYPE><LIBRARY_ITEM ItemType="Marker" IN="0"
UNIT="RefTime" OUT="0" SynchMarker="yes"><NAME></NAME><ID>{B02BA392-50D4-490C-9FDB-
0B7B350D2281}</ID><VERSION subversion="1">2.0</VERSION><FILE_NAME></FILE_NAME>
<Library.MDProperties><Property Name="Title" DefaultValue=""><![CDATA[Synch Point]]>
</Property></Library.MDProperties><Data Id="ODKeyPosition"><![CDATA[<ODKeyPosition
Version="1.0"><DrawingStream Value="1&#xA;0 13 BEGIN_SCENE_11052 3 0 1 0 65535 0 0 0 0
0 0.84375 0 1 0 1 720 9 END_SCENE"/></ODKeyPosition>]]></Data></LIBRARY_ITEM>
<CAMERA_NAME_NODE CAMERA_NAME="Camera 1"/><Library.MDLibraryItemLink><Reference
Id="TWIN"><LibraryItem Id="{74927B4E-00FF-4E12-B428-BF392E82CFA2}"
LastKnownLocation=".\ AUsten(REdsockswhiteandblack), Kevin(Greysocks),
Johnny(REdandwhitesocksTall), David(REdandwhitesocksshort)(10)-Camera 2.avi"/>
</Reference></Library.MDLibraryItemLink><OverlayDrawing><![CDATA[13 BEGIN_SCENE_11052 3
 0 1 0 65535 0 0 0 0 0 0.84375 0 1 2 13501 3 0 1 0 65520 0 0 0 0 0 1 0 1 13 1105 3 0 1
 0 65535 0 0 0 0 0 1 0 2 2 360 0 0 1 360 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127
 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 420 0 0 1 420 576 0 1 0 0
 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0
 0 1 0 2 2 480 0 0 1 480 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0
 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 540 0 0 1 540 576 0 1 0 0 0 0 1 2 1 255 
104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 600 0
 0 1 600 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3
 0 1 0 65535 0 0 0 0 0 1 0 2 2 660 0 0 1 660 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127
 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 720 0 0 1 720 576 0 1
 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0
 0 0 0 1 0 2 2 300 0 0 1 300 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3
 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 240 0 0 1 240 576 0 1 0 0 0 0 1 2 1
 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2
 180 0 0 1 180 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 
1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 120 0 0 1 120 576 0 1 0 0 0 0 1 2 1 255 104 32 255
 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 60 0 0 1 60 576
 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 
0 0 0 0 0 1 0 2 2 0 0 0 1 0 576 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3
 2 0 20 20 1 1 1 3 1 11 lineSpaceId19 Space between lines1 0 60 1 11 gridWidthId10 Grid
 Width1 0 720 1 12 gridHeightId11 Grid Height1 0 576 0 1 1 13 orientationId16 Grid
 Orientation1 0 1 0 0 0 13501 3 0 1 0 65520 0 0 0 0 0 1 0 1 9 1105 3 0 1 0 65535 0 0 0 
0 0 1 0 2 2 0 288 0 1 720 288 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2
 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 0 348 0 1 720 348 0 1 0 0 0 0 1 2 1 255
 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 0 
408 0 1 720 408 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105   
3 0 1 0 65535 0 0 0 0 0 1 0 2 2 0 468 0 1 720 468 0 1 0 0 0 0 1 2 1 255 104 32 255 1    
127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 0 528 0 1 720 528
 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535
 0 0 0 0 0 1 0 2 2 0 228 0 1 720 228 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255
 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 0 168 0 1 720 168 0 1 0 0 0 0 1 2
 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2
 2 0 108 0 1 720 108 0 1 0 0 0 0 1 2 1 255 104 32 255 1 127 127 127 255 0 3 2 0 20 20 1
 1105 3 0 1 0 65535 0 0 0 0 0 1 0 2 2 0 48 0 1 720 48 0 1 0 0 0 0 1 2 1 255 104 32 255
 1 127 127 127 255 0 3 2 0 20 20 1 1 1 3 1 11 lineSpaceId19 Space between lines1 0 60 1
 11 gridWidthId10 Grid Width1 0 720 1 12 gridHeightId11 Grid Height1 0 576 0 1 1 13
 orientationId16 Grid Orientation1 0 0 0 0 0 1 720 9 END_SCENE]]></OverlayDrawing>
<Library.MDProperties><Property Name="CameraID" DefaultValue=""><![CDATA[0]]>
</Property><Property Name="Comment" DefaultValue=""><![CDATA[This is an acceleration
 drill. The skater uses a wider stance than normal to achieve a shorter but more rapid
 stride. This is required to get up to full speed in a hurry. Once at full speed a long
 powerful stride will keep you there with the least amount of energy consumed. Body
 position is once again important. Leaning too far forward will cause the skater to 
loose traction as all their weight is not over their skates.]]></Property><Property
 Name="Title" DefaultValue=""><![CDATA[ AUsten(REdsockswhiteandblack),
 Kevin(Greysocks), Johnny(REdandwhitesocksTall), David(REdandwhitesocksshort)(10)-
Camera 1.avi]]></Property></Library.MDProperties></LIBRARY_ITEM>

如果它有帮助,这些文件每个都对应一个AVI视频剪辑。我试图使用一些条件语句一次编辑其中的许多条目,以便在引用它们的应用程序中更改剪辑属性。这是第三方应用程序,因此对我来说这个任务相当困难,因为所有测试都必须完成“blackbox”(我不知道开发人员用来编写这些设置文件的方法/结构)。

我想我在问这个数据是否符合XML标准,或者它是否是一个完全不同的结构。这些文件中有一些看起来不像正确的XML ...

[更新]以下是我用于扫描解析数据并将其打印到屏幕的代码(以确保所有内容都正确构建):

public Analyzer(String source) {

    DOMParser parser = new DOMParser();

    int level= 0;
    Node curItem = null;

    try {
        parser.parse(source);
        Document doc = parser.getDocument();

        NodeList nodeList = doc.getElementsByTagName("LIBRARY_ITEM");
        for (int i = 0; i < nodeList.getLength(); i++) {
            read(nodeList.item(i), 0);
        }

    } catch (Exception ex) {
        ex.printStackTrace();
    }

}

public static void read(Node node, int level) {

    if (node == null) {
        return;
    }

    int type = node.getNodeType();
    switch (type) {
        case Node.DOCUMENT_NODE: {
            System.out.print(node.getNodeName()+": ");
            read(((Document) node).getDocumentElement(), level+1);
            break;
        }


        case Node.TEXT_NODE: {
            System.out.print(" = \""+node.getNodeValue().replaceAll("\\s", "")+"\"");
            break;
        }

        case Node.ELEMENT_NODE: {
            System.out.print("\n");
            for (int i = 0; i < level; i++) {
                System.out.print("\t");
            }
            System.out.print(node.getNodeName());
            NodeList children = node.getChildNodes();
            int length = children.getLength();
            for (int i = 0; i < length; i++) {
                read(children.item(i), level+1);
            }
            break;
        }
    }
}

当它运行时,它会为一些字段做一些时髦的事情......而且,某些值无法正确打印。这很容易成为我的错误,因为我非常缺乏XML经验。

1 个答案:

答案 0 :(得分:0)

你还没有完全描述你的长度(即我不知道什么“质朴”意味着不正确的输出)。那么说,你提到缺少的数据,而xml有CDATA部分,所以我想你的代码缺少CDATA节点的情况,例如: Node.CDATA_SECTION_NODE

还要注意,给定元素节点可能有多个子文本节点(允许xml解析器在解析时将其分解)。

相关问题