R xmlToDataFrame XML TEI

时间:2012-12-19 22:14:59

标签: xml r

Someonr附带了一个XML TEI(文本编码计划),用于制作R traitement ... 我不是XML的专家,不是TEI的专家(我不知道它是否形成良好)。我的所有尝试都没有成功...... 我的文件:

<?xml version="1.0" encoding="utf-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Luxury Bound</title>
      </titleStmt>
      <publicationStmt>
        <p/>
      </publicationStmt>
      <sourceDesc>
        <msDesc>
          <msIdentifier>
            <country>unknown</country>
            <msName>unknown location (Hours by a follower of Jean Semont)</msName>
          </msIdentifier>
          <msContents>
            <msItemStruct/>
            <msItem>
              <p xml:id="content1">Hours (Tournai)</p>
            </msItem>
          </msContents>
          <physDesc>
            <decoDesc>
              <p>Information on the illustrations : </p>
              <p>Total number of illustrations : </p>
              <p>Number of miniatures : </p>
              <p>Number of historiated initials : </p>
              <p>Number of grisailles : </p>
              <p>Number of drawings : </p>
              <p>
                <listPerson type="miniaturists">
                  <person>
                    <persName>Jean Semont (follower)</persName>
                  </person>
                </listPerson>
              </p>
            </decoDesc>
....

我试过了:

library('XML')
doc<-xmlParse("luxud1.xml")
summary(doc)

$nameCounts

        catDesc        category               p           title         measure             val            date 
             11              11              10               6               4               4               3 
             ab       langUsage        language        origDate        persName             TEI      additional 
              2               2               2               2               2               1               1 
      adminInfo    availability            bibl         binding     bindingDesc          catRef       classDecl 
              1               1               1               1               1               1               1 
        country        decoDesc    encodingDesc          extent        fileDesc              hi         history 
              1               1               1               1               1               1               1 
       listBibl      listPerson      measureGrp      msContents          msDesc    msIdentifier          msItem 
              1               1               1               1               1               1               1 
   msItemStruct          msName            note      objectDesc          origin          person        physDesc 
              1               1               1               1               1               1               1 
      placeName       principal     profileDesc publicationStmt             ref          region      settlement 
              1               1               1               1               1               1               1 
     sourceDesc     supportDesc        taxonomy       teiHeader       textClass       titleStmt 
              1               1               1               1               1               1 

$numNodes
[1] 102

如果我尝试过:

p<-xmlToDataFrame(doc,homogeneous=FALSE, nodes= getNodeSet(doc, "//persName") )

我有一个扼杀的东西......文件所有价值的串联...... 你能给出好的方法吗? 谢谢 ë

0 个答案:

没有答案