我正在尝试计算xml文档中事件记录中特定人员的共现次数。我的源文档由事件元素组成,这些元素包含p元素中的散文和bibl元素中的书目记录,这两个元素都包含对人的引用。我希望能够计算出两个人在整个文档中的事件中出现的频率。我一直在使用XSLT 2.0,但可以切换到3.0。
(例如,我怎样才能得到Nancy Drew和Dick Tracy在下面的活动中的答案3的答案?或者Dick Tracy和Sam Spade的1次?)
<listEvent>
<event xml:id="e1">
<p>pretium eget erat eu cursus. Duis pulvinar lectus sed quam vehicula tincidunt in
vel nunc. Cras convallis elementum diam. Sed nec viverra magna. Then <name
SameAs="detectives.xml#ND">Nancy Drew</name> solved the case. A consequat
tortor molestie ut. Praesent lobortis ipsum sit amet bibendum consequat. </p>
<bibl><name SameAs="detectives.xml#DT">Tracy, Dick</name>. The Mysterious Case of the
Orange Fish. Penguin Publishing. </bibl>
<bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Case of the Blue
Carbuncle Penguin Publishing. </bibl>
</event>
<event xml:id="e2">
<p> facilisis turpis eu, gravida enim. Mauris adipiscing magna consequat dolor
auctor, sit amet tincidunt felis auctor. <name SameAs="detectives.xml#ND">Nancy
Drew</name> and <name SameAs="detectives.xml#DT">Dick Tracy</name> went into
business together. Aliquam pharetra semper erat, at viverra tellus vestibulum
quis. Sed facilisis convallis justo, suscipit fermentum lorem egestas nec.
Phasellus in aliquam eros, vitae fringilla augue </p>
<bibl><name SameAs="detectives.xml#TH">Hardy, Tom</name>. Growing Up Is Hard to Do:
The Story of a Boy Detective. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Case of the Blue
Carbuncle. Penguin Publishing. </bibl>
<bibl><name SameAs="detectives.xml#SH">Holmes, Sherlock</name>. The Hound of the
Baskervilles. Arsenal Press. </bibl>
</event>
<event xml:id="e3">
<p> Curabitur dapibus eu ligula sed elementum. Curabitur sit amet nisi dictum. <name
SameAs="detectives.xml#SS">Sam Spade</name> was the only detective in town.
Donec cursus diam sem, astor. </p>
<bibl><name SameAs="detectives.xml#TH">Hardy, Tom</name>. Growing Up Is Hard to Do:
The Story of a Boy Detective. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#SS">Spade, Sam</name>. My Friends' Business
Ventures. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#DN">Drew, Nancy</name>. Blonde and Curious.
Arsenal Press.</bibl>
</event>
<event xml:id="e4">
<p> Duis pulvinar lectus sed quam vehicula tincidunt in vel nunc. <name
SameAs="detectives.xml#ND">Nancy Drew</name> and <name
SameAs="detectives.xml#DT">Dick Tracy</name> made 110% profit that year. Cras
convallis elementum diam. Sed nec viverra magna. A consequat tortor molestie ut.
Praesent lobortis ipsum sit amet bibendum consequat. </p>
<bibl><name SameAs="detectives.xml#SS">Spade, Sam</name>. My Friends' Business
Ventures. Knopf Press. </bibl>
<bibl><name SameAs="detectives.xml#MH">Holmes, Mycroft</name>. Sons and Brothers.
Knopf Press. </bibl>
</event>
</listEvent>
@ michael.hor257k我喜欢你的想法。我希望得到如下所示的输出:
<gexf> <graph><nodes count="77">
<node id="1.0" label="Sam Spade"/>
<node id="2.0" label="Dick Tracy"/>
<node id="3.0" label="Nancy Drew"/>
…
</nodes>
<edges count="254">
<edge id="1" source="1.0" target="2.0" weight="1.0"/>
<edge id="2" source="1.0" target="3.0" weight="2.0"/>
<edge id="3" source="2.0" target="3.0" weight="3.0"/>
…
</edges>
</graph>
</gexf>
... @weight值是我在计算时遇到的问题。
我设法为每个人分配一个节点@id。节点@ids然后组成@source和@target值(第一个是Sam Spade和Dick Tracy,第二个Sam Spade和Nancy Drew),@ weight应该是它们在doc中一起显示的次数(我 - 或许也可能 - 简化了我的例子。在我的实际源文档中,每个元素中都有一堆其他属性和值,包括每个人姓名的@n,所以使用select-value来填充@ ids,@ sources和@target很容易。)
@tim,不用担心,@ SameAs指向一个权威列表,这样无论文章中的个人名字是如何拼写的(即露西,格雷厄姆小姐和L.福斯特夫人都可以在同一个女人,女孩,在她结婚之前和之后,或在书目条目的情况下被撤销的文本中,可以将其解析为一个人。
答案 0 :(得分:0)
不用担心,@ SameAs指向权威列表
嗯,XSLT的内容依赖于XML源文档中的内容 - 所以这里所需的计数将在之前解析不同的@SameAs值。
在我的实际源文档中,还有许多其他属性和 每个元素中的值,包括每个人姓名的@n
好的,因为我们没有那个,所以我使用了@SameAs属性,好像它是一个独特的id。以下实际上是一个XSLT 1.0样式表,由EXSLT set:distinct()函数强化。这只是一个草图,其中有一些脚手架留在里面,所以我们可以看看这是否朝着正确的方向发展。
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:set="http://exslt.org/sets"
extension-element-prefixes="set">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:key name="eventByID" match="event" use=".//name/@SameAs" />
<xsl:variable name="distinct_nodes" select="set:distinct(/listEvent/event//name/@SameAs)" />
<xsl:variable name="root" select="/" />
<xsl:template match="/">
<graph>
<nodes>
<xsl:for-each select="$distinct_nodes">
<node id="{.}"/>
</xsl:for-each>
</nodes>
<edges>
<xsl:for-each select="$distinct_nodes[not(position()=last())]">
<xsl:variable name="source" select="." />
<xsl:variable name="pos" select="position()" />
<xsl:for-each select="$distinct_nodes[position()>$pos]">
<xsl:variable name="target" select="." />
<xsl:variable name="common_events" select="key('eventByID', $source)[@xml:id=key('eventByID', $target)/@xml:id]" />
<xsl:if test="$common_events">
<edge source="{$source}" target="{$target}" weight="{count($common_events)}">
<!-- use this for test purposes -->
<!--
<xsl:for-each select="$common_events">
<event id="{@xml:id}"/>
</xsl:for-each>
-->
</edge>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
</edges>
</graph>
</xsl:template>
</xsl:stylesheet>
应用于您的示例XML,结果为:
<?xml version="1.0" encoding="utf-8"?>
<graph>
<nodes>
<node id="detectives.xml#ND"/>
<node id="detectives.xml#DT"/>
<node id="detectives.xml#SH"/>
<node id="detectives.xml#TH"/>
<node id="detectives.xml#SS"/>
<node id="detectives.xml#DN"/>
<node id="detectives.xml#MH"/>
</nodes>
<edges>
<edge source="detectives.xml#ND" target="detectives.xml#DT" weight="3"/>
<edge source="detectives.xml#ND" target="detectives.xml#SH" weight="2"/>
<edge source="detectives.xml#ND" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#ND" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#ND" target="detectives.xml#MH" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#SH" weight="2"/>
<edge source="detectives.xml#DT" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#DT" target="detectives.xml#MH" weight="1"/>
<edge source="detectives.xml#SH" target="detectives.xml#TH" weight="1"/>
<edge source="detectives.xml#TH" target="detectives.xml#SS" weight="1"/>
<edge source="detectives.xml#TH" target="detectives.xml#DN" weight="1"/>
<edge source="detectives.xml#SS" target="detectives.xml#DN" weight="1"/>
<edge source="detectives.xml#SS" target="detectives.xml#MH" weight="1"/>
</edges>
</graph>