Someonr附带了一个XML TEI(文本编码计划),用于制作R traitement ... 我不是XML的专家,不是TEI的专家(我不知道它是否形成良好)。我的所有尝试都没有成功...... 我的文件:
<?xml version="1.0" encoding="utf-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Luxury Bound</title>
</titleStmt>
<publicationStmt>
<p/>
</publicationStmt>
<sourceDesc>
<msDesc>
<msIdentifier>
<country>unknown</country>
<msName>unknown location (Hours by a follower of Jean Semont)</msName>
</msIdentifier>
<msContents>
<msItemStruct/>
<msItem>
<p xml:id="content1">Hours (Tournai)</p>
</msItem>
</msContents>
<physDesc>
<decoDesc>
<p>Information on the illustrations : </p>
<p>Total number of illustrations : </p>
<p>Number of miniatures : </p>
<p>Number of historiated initials : </p>
<p>Number of grisailles : </p>
<p>Number of drawings : </p>
<p>
<listPerson type="miniaturists">
<person>
<persName>Jean Semont (follower)</persName>
</person>
</listPerson>
</p>
</decoDesc>
....
我试过了:
library('XML')
doc<-xmlParse("luxud1.xml")
summary(doc)
$nameCounts
catDesc category p title measure val date
11 11 10 6 4 4 3
ab langUsage language origDate persName TEI additional
2 2 2 2 2 1 1
adminInfo availability bibl binding bindingDesc catRef classDecl
1 1 1 1 1 1 1
country decoDesc encodingDesc extent fileDesc hi history
1 1 1 1 1 1 1
listBibl listPerson measureGrp msContents msDesc msIdentifier msItem
1 1 1 1 1 1 1
msItemStruct msName note objectDesc origin person physDesc
1 1 1 1 1 1 1
placeName principal profileDesc publicationStmt ref region settlement
1 1 1 1 1 1 1
sourceDesc supportDesc taxonomy teiHeader textClass titleStmt
1 1 1 1 1 1
$numNodes
[1] 102
如果我尝试过:
p<-xmlToDataFrame(doc,homogeneous=FALSE, nodes= getNodeSet(doc, "//persName") )
我有一个扼杀的东西......文件所有价值的串联...... 你能给出好的方法吗? 谢谢 ë