我的XML文件:
<bncDoc xml:id="KS0">
<teiHeader>
<fileDesc>
<titleStmt>
<title> Oxford City Council Health and Environmental Protection Committee meeting. Sample containing about 11223 words speech recorded in public context
</title>
<respStmt>
<resp> Data capture and transcription
</resp>
<name> Oxford University Press
</name>
</respStmt>
</titleStmt>
<editionStmt>
<edition>BNC XML Edition, December 2006
</edition>
</editionStmt>
<extent> 11223 tokens; 11688 w-units; 482 s-units
</extent>
<publicationStmt>
<distributor>Distributed under licence by Oxford University Computing Services on behalf of the BNC Consortium.
</distributor>
<availability> This material is protected by international copyright laws and may not be copied or redistributed in any way. Consult the BNC Web Site at http://www.natcorp.ox.ac.uk for full licencing and distribution conditions.
</availability>
<idno type="bnc">KS0
</idno>
<idno type="old"> OCCEnv
</idno>
</publicationStmt>
<sourceDesc>
<recordingStmt>
<recording n="139401" type="DAT"/>
</recordingStmt>
</sourceDesc>
</fileDesc>
<encodingDesc>
<tagsDecl>
<namespace name="">
<tagUsage gi="align" occurs="69"/>
<tagUsage gi="c" occurs="1408"/>
<tagUsage gi="div" occurs="1"/>
<tagUsage gi="event" occurs="3"/>
<tagUsage gi="mw" occurs="110"/>
<tagUsage gi="pause" occurs="2"/>
<tagUsage gi="s" occurs="482"/>
<tagUsage gi="u" occurs="192"/>
<tagUsage gi="unclear" occurs="65"/>
<tagUsage gi="vocal" occurs="7"/>
<tagUsage gi="w" occurs="11688"/>
</namespace>
</tagsDecl>
</encodingDesc>
<profileDesc>
<creation date="0000">0000-00-00 Origination/creation date not known
</creation>
<particDesc n="C872">
<person ageGroup="X" xml:id="PS6H7" role="unspecified" sex="f" soc="AB" dialect="NONE" educ="X">
<persName>Chair
</persName>
</person>
<person ageGroup="X" xml:id="PS6H8" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>g
</persName>
</person>
<person ageGroup="X" xml:id="PS6H9" role="unspecified" sex="f" soc="UU" dialect="NONE" educ="X">
<persName>chair2
</persName>
</person>
<person ageGroup="X" xml:id="PS6HA" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>i
</persName>
</person>
<person ageGroup="X" xml:id="PS6HB" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>h
</persName>
</person>
<person ageGroup="X" xml:id="PS6HC" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>foe
</persName>
</person>
<person ageGroup="X" xml:id="PS6HD" role="unspecified" sex="f" soc="UU" dialect="NONE" educ="X">
<persName>b
</persName>
</person>
<person ageGroup="X" xml:id="PS6HE" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>a
</persName>
</person>
<person ageGroup="X" xml:id="PS6HF" role="unspecified" sex="f" soc="UU" dialect="NONE" educ="X">
<persName>ei
</persName>
</person>
<person ageGroup="X" xml:id="PS6HG" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>bp
</persName>
</person>
<person ageGroup="X" xml:id="PS6HH" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>c
</persName>
</person>
<person ageGroup="X" xml:id="PS6HJ" role="unspecified" sex="m" soc="UU" dialect="NONE" educ="X">
<persName>d
</persName>
</person>
<person ageGroup="X" xml:id="PS6HK" role="unspecified" sex="f" soc="UU" dialect="NONE" educ="X">
<persName>e
</persName>
</person>
<person ageGroup="X" xml:id="PS6HL" role="unspecified" sex="u" soc="UU" dialect="NONE" educ="X">
<persName>d
</persName>
</person>
</particDesc>
<settingDesc>
<setting n="OCCEnv" who="PS6H7 PS6H8 PS6H9 PS6HA PS6HB PS6HC PS6HD PS6HE PS6HF PS6HG PS6HH PS6HJ PS6HK PS6HL">
<placeName>Oxfordshire: Oxford
</placeName>
<activity> Council Committee Meeting
</activity>
</setting>
</settingDesc>
<textClass>
<catRef targets="SPO ALLTIM3 ALLAVA0 ALLTYP2 SCGDOM3 SPOLOG2 SPOREG1"/>
<classCode scheme="DLEE">S meeting
</classCode>
<keywords>
<term> (none)
</term>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change date="2006-10-21" who="#OUCS">Tag usage updated for BNC-XML
</change>
<change date="2000-12-13" who="#OUCS">Last check for BNC World first release
</change>
<change date="2000-09-06" who="#OUCS">Redo tagusage tables
</change>
<change date="2000-09-01" who="#OUCS">Check all tagcounts
</change>
<change date="2000-06-23" who="#OUCS">Resequenced s-units and added headers
</change>
<change date="2000-01-29" who="#OUCS">Revised participant details
</change>
<change date="2000-01-21" who="#OUCS">Added date info
</change>
<change date="2000-01-09" who="#OUCS">Updated all catrefs
</change>
<change date="2000-01-09" who="#OUCS">Updated REC elements to include tape number
</change>
<change date="2000-01-08" who="#OUCS">Updated titles
</change>
<change date="1999-12-25" who="#OUCS">corrected tagUsage
</change>
<change date="1999-09-21" who="#UCREL">POS codes revised for BNC-2; header updated
</change>
<change date="1994-11-27" who="#dominic">Initial accession to corpus
</change>
</revisionDesc>
</teiHeader>
<stext type="OTHERSP">
<div><!--
Oxford City Council: Health and Environmental Protection Committee (Nuclear Issues and Pollution Control) Sub-Committee.
Wednesday, 18th April 1990, 2.30pm, Town Hall.--><u who="PS6H7">
<s n="3">
<w c5="AV0" hw="well" pos="ADV">Well
</w>
<c c5="PUN">,
</c>
<w c5="AJ0" hw="good" pos="ADJ">good
</w>
<w c5="NN1" hw="afternoon" pos="SUBST">afternoon
</w>
<c c5="PUN">,
</c>
<w c5="PNI" hw="everybody" pos="PRON">everybody
</w>
<c c5="PUN">,
</c>
<w c5="PNP" hw="i" pos="PRON">I
</w>
<w c5="VVB" hw="think" pos="VERB">think
</w>
<w c5="PNP" hw="we" pos="PRON">we
</w>
<w c5="VHD" hw="have" pos="VERB">'d
</w>
<w c5="AV0" hw="well" pos="ADV">better
</w>
<w c5="VVI" hw="get" pos="VERB">get
</w>
<w c5="VVN" hw="start" pos="VERB">started
</w>
<c c5="PUN">.
</c>
</s>
<s n="4">
<w c5="PNP" hw="we" pos="PRON">We
</w>
<w c5="VVD" hw="look" pos="VERB">looked
</w>
<w c5="AV0" hw="so" pos="ADV">so
</w>
<w c5="AJ0" hw="thin" pos="ADJ">thin
</w>
<w c5="PRP" hw="on" pos="PREP">on
</w>
<w c5="AT0" hw="the" pos="ART">the
</w>
<w c5="NN1" hw="ground" pos="SUBST">ground
</w>
<c c5="PUN">,
</c>
<w c5="PNP" hw="i" pos="PRON">I
</w>
<w c5="VVD" hw="think" pos="VERB">thought
</w>
<w c5="PNP" hw="we" pos="PRON">we
</w>
<w c5="VM0" hw="would" pos="VERB">'d
</w>
<w c5="VVI" hw="sit" pos="VERB">sit
</w>
<w c5="CJC" hw="and" pos="CONJ">and
</w>
<w c5="VVI" hw="wait" pos="VERB">wait
</w>
<w c5="CJC" hw="and" pos="CONJ">and
</w>
<w c5="VVI" hw="see" pos="VERB">see
</w>
<w c5="CJS" hw="if" pos="CONJ">if
</w>
<w c5="PNI" hw="everyone" pos="PRON">everyone
</w>
<w c5="VBZ" hw="be" pos="VERB">'s
</w>
<w c5="VVG-AJ0" hw="come" pos="VERB">coming
</w>
<c c5="PUN">,
</c>
<w c5="CJC" hw="but" pos="CONJ">but
</w>
<w c5="UNC" hw="erm" pos="UNC">erm
</w>
<w c5="PNP" hw="we" pos="PRON">we
</w>
<w c5="VM0" hw="will" pos="VERB">'ll
</w>
<w c5="VHI" hw="have" pos="VERB">have
</w>
<w c5="TO0" hw="to" pos="PREP">to
</w>
<w c5="VVI" hw="get" pos="VERB">get
</w>
<w c5="VVN" hw="start" pos="VERB">started
</w>
<w c5="AV0" hw="anyway" pos="ADV">anyway
</w>
<c c5="PUN">.
</c>
</s>
<s n="5">
<w c5="PNP" hw="we" pos="PRON">We
</w>
<w c5="VM0" hw="will" pos="VERB">'ll
</w>
<w c5="VVI" hw="welcome" pos="VERB">welcome
</w>
<c c5="PUN">,
</c>
<w c5="PNP" hw="we" pos="PRON">we
</w>
<w c5="VHB" hw="have" pos="VERB">have
</w>
<w c5="CRD" hw="two" pos="ADJ">two
</w>
<w c5="NN2" hw="speaker" pos="SUBST">speakers
</w>
<c c5="PUN">,
</c>
<w c5="NP0" hw="mr" pos="SUBST">Mr
</w>
<w c5="NP0" hw="bob" pos="SUBST">Bob
</w>
<w c5="NP0" hw="plumtree" pos="SUBST">Plumtree
</w>
<c c5="PUN">,
</c>
<w c5="CJC" hw="and" pos="CONJ">and
</w>
<w c5="NP0" hw="ms" pos="SUBST">Ms
</w>
<w c5="NP0" hw="erica" pos="SUBST">Erica
</w>
<w c5="NP0" hw="ison" pos="SUBST">Ison
</w>
<c c5="PUN">.
</c>
</s>
<s n="6">
<w c5="PNP" hw="we" pos="PRON">We
</w>
<w c5="VVD" hw="ask" pos="VERB">asked
</w>
<w c5="PNP" hw="they" pos="PRON">them
</w>
<w c5="PRP" hw="to" pos="PREP">to
</w>
<w c5="AT0" hw="the" pos="ART">the
</w>
<w c5="NN1" hw="meeting" pos="SUBST">meeting
</w>
<w c5="CJC" hw="and" pos="CONJ">and
</w>
<w c5="PNP" hw="we" pos="PRON">we
</w>
<w c5="VVB" hw="look" pos="VERB">look
</w>
<w c5="AV0" hw="forward" pos="ADV">forward
</w>
<w c5="PRP" hw="to" pos="PREP">to
</w>
<w c5="VVG-NN1" hw="listen" pos="VERB">listening
</w>
<w c5="PRP" hw="to" pos="PREP">to
</w>
<w c5="PNP" hw="you" pos="PRON">you
</w>
<w c5="AV0" hw="later" pos="ADV">later
</w>
<w c5="AVP" hw="on" pos="ADV">on
</w>
<w c5="PRP" hw="in" pos="PREP">in
</w>
<w c5="AT0" hw="the" pos="ART">the
</w>
<w c5="NN1" hw="agenda" pos="SUBST">agenda
</w>
<c c5="PUN">.
</c>
</s>
<s n="7">
<w c5="AT0" hw="the" pos="ART">The
</w>
<w c5="NN2" hw="minute" pos="SUBST">minutes
</w>
<w c5="PRF" hw="of" pos="PREP">of
</w>
<w c5="AT0" hw="the" pos="ART">the
</w>
<w c5="NN1" hw="meeting" pos="SUBST">meeting
</w>
<w c5="VVD-VVN" hw="hold" pos="VERB">held
</w>
<w c5="PRP" hw="in" pos="PREP">in
</w>
<w c5="NP0" hw="january" pos="SUBST">January
</w>
<c c5="PUN">.
</c>
</s>
<s n="8">
<w c5="DT0" hw="any" pos="ADJ">Any
</w>
<w c5="NN2" hw="correction" pos="SUBST">corrections
</w>
<w c5="PRP" hw="to" pos="PREP">to
</w>
<w c5="AT0" hw="the" pos="ART">the
</w>
<w c5="NN2" hw="minute" pos="SUBST">minutes
</w>
<w c5="ORD" hw="first" pos="ADJ">first
</w>
<c c5="PUN">?
</c>
</s>
<s n="9">
<w c5="NN1-VVB" hw="page" pos="SUBST">Page
</w>
<w c5="CRD" hw="1" pos="ADJ">1
</w>
<c c5="PUN">?
</c>
</s>
<s n="483">
<w c5="EX0" hw="there" pos="PRON">There
</w>
<w c5="VBZ" hw="be" pos="VERB">is
</w>
<w c5="AT0" hw="a" pos="ART">a
</w>
<w c5="NN1" hw="school" pos="SUBST">school
</w>
<w c5="PRP" hw="in" pos="PREP">in
</w>
<w c5="NP0" hw="ferry" pos="SUBST">Ferry
</w>
<w c5="NP0" hw="hinksey" pos="SUBST">Hinksey
</w>
<w c5="NP0" hw="road" pos="SUBST">Road
</w>
<w c5="VBZ" hw="be" pos="VERB">is
</w>
<w c5="XX0" hw="not" pos="ADV">n't
</w>
<w c5="EX0" hw="there" pos="PRON">there
</w>
<c c5="PUN">,
</c>
<w c5="AT0" hw="a" pos="ART">a
</w>
<w c5="AJ0" hw="middle" pos="ADJ">middle
</w>
<w c5="NN1" hw="school" pos="SUBST">school
</w>
<w c5="PNP" hw="i" pos="PRON">I
</w>
<w c5="VVB" hw="think" pos="VERB">think
</w>
<c c5="PUN">,
</c>
<w c5="AV0" hw="so" pos="ADV">so
</w>
<w c5="DT0" hw="that" pos="ADJ">that
</w>
<w c5="VBZ" hw="be" pos="VERB">'s
</w>
<w c5="AT0" hw="the" pos="ART">the
</w>
<w c5="AJ0" hw="only" pos="ADJ">only
</w>
<w c5="PNI" hw="one" pos="PRON">one
</w>
<w c5="PNP" hw="i" pos="PRON">I
</w>
<w c5="VVB" hw="know" pos="VERB">know
</w>
<c c5="PUN">.
</c>
</s>
<s n="484">
<w c5="AT0" hw="the" pos="ART">The
</w>
<w c5="NN1" hw="thing" pos="SUBST">thing
</w>
<w c5="PNP" hw="i" pos="PRON">I
</w>
<w c5="VM0" hw="would" pos="VERB">'d
</w>
<w c5="AV0" hw="really" pos="ADV">really
</w>
<w c5="VVI" hw="like" pos="VERB">like
</w>
<w c5="VBZ" hw="be" pos="VERB">is
</w>
<w c5="AT0" hw="a" pos="ART">a
</w>
<w c5="NN1" hw="glossary" pos="SUBST">glossary
</w>
<w c5="PRF" hw="of" pos="PREP">of
</w>
<w c5="NN2" hw="term" pos="SUBST">terms
</w>
<c c5="PUN">.
</c>
</s>
</u>
</div>
</stext>
</bncDoc>
如何创建一个包含它们发生次数的单词表,并按频率递减的顺序对它们进行排序?
答案 0 :(得分:0)
您可以通过以下方式计算输入文档中的单词:
获取所有文本节点(变量txtNodes
)。出于性能原因,您可以限制
对包含其他东西的节点的选择不仅仅是&#34; white&#34;人物
(normalize-space()
)。
提取单个字词并保存在words
变量中。
将这些字词按其内容的大写字母(for-each-group
)分组。
每组打印:
count(current-group())
)。下面是示例XSLT代码:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:variable name="txtNodes" select="//text()[normalize-space()]" as="xs:string*"/>
<xsl:variable name="words" as="xs:string*">
<xsl:for-each select="$txtNodes">
<xsl:analyze-string select="." regex="\w+">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:for-each>
</xsl:variable>
<xsl:text>Words / # of occurrences:
</xsl:text>
<xsl:for-each-group select="$words" group-by="upper-case(.)">
<xsl:sort select="count(current-group())" data-type="number" order="descending"/>
<xsl:sort select="upper-case(.)"/>
<xsl:value-of select="concat(upper-case(substring(., 1, 1)), lower-case(substring(., 2)))"/>
<xsl:text> - </xsl:text>
<xsl:value-of select="count(current-group())"/>
<xsl:text>
</xsl:text>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>