我正在尝试编写一个脚本,该脚本遍历三个现有XML文档并编译第四个XML文档,其中包含现有三个文档中的所有语素(说语言的部分单词)。我试图确保这个新的语素数据库不包含任何重复项,并且我很难使它不添加重复项。我将在下面立即发布相关代码段,并在底部发布整个相关代码块。
对重复项的检查如下:((string)$source == (string)$storySource)
,其中$ source和$ storySource都是simpleXMLElement,如下所示:<m>text</m>
。谁能告诉我我哪里出问题了?
最好, 吉米
这是遍历其中一个XML文件的整个循环。
$storycorpus = new SimpleXMLElement($file,null,true);
$storyEntries = $storycorpus->xpath("//morpheme");
foreach($storyEntries as $entry){
// check to see if in morpheme database. we will match the Pomo and the English, hence, if either is not a match,
// we will add a new morpheme
$storySource = $entry->m;
$storyGloss = $entry->g;
// set a variable equal to false
$foundInDB = false;
//we will loop through the database looking for a match.
foreach($morphemeEntries as $existingMorpheme){
$source = $existingMorpheme->source;
$gloss = $existingMorpheme->gloss;
// if we find a match, we will set our variable to be true and break out of the morpheme DB loop
if(((string)$source == (string)$storySource) && ((string)$gloss == (string)$storyGloss)){
$foundInDB = true; // problem: this line isn't firing
break;
}
}
// after the morphemeDB loop, we will check to see if the var is true.
if($foundInDB == true){
// if it is true, we don't need to enter anything and can
// go to the next entry
continue;
} else{
// if we didn't find a match, create a new morpheme
$newMorphemeEntry = $morphemeDB->addChild("morpheme");
$newMorphemeEntry->addChild("source", $storySource);
$newMorphemeEntry->addChild("gloss", $storyGloss);
$newMorphemeEntry->addChild("root", $storySource);
$newMorphemeEntry->addChild("hypernym", $storySource);
$newMorphemeEntry->addChild("link", "S");
if(substr($storySource, 0, 1) == "-"){
$newMorphemeEntry->addChild("affix", "suffix");
} elseif(substr($storySource, -1, 1) == "-"){
$newMorphemeEntry->addChild("affix", "prefix");
} else{
$newMorphemeEntry->addChild("affix", "root");
}
}
}
好的,所以我重写了代码块并使用了DOMDocument而不是SimpleXML,而且在防止重复方面我还是没有运气。这是新代码
// check to see if in morpheme database. we will match the Pomo and the English, hence, if either is not a match,
// we will add a new morpheme
$phraseSource = $entry->nodeValue;
$phraseGlossId = $entry->getAttribute("id");
$phraseGloss = $xpath2->query("//g[@id =\"$phraseGlossId\"]")->item(0)->nodeValue;
// set a variable equal to false
$foundInDB = false;
//we will loop through the database looking for a match.
foreach($morphemeEntries as $existingMorpheme){
$source = $existingMorpheme->getElementsByTagName("source")->item(0)->nodeValue;
$gloss = $existingMorpheme->getElementsByTagName("gloss")->item(0)->nodeValue;
// if we find a match, we will set our variable to be true and break out of the morpheme DB loop
if(($source == $phraseSource) && ($gloss == $phraseGloss)){
$foundInDB = true; // problem: this line isn't firing
break;
}
}
// after the morphemeDB loop, we will check to see if the var is true.
if($foundInDB == true){
// if it is true, we don't need to enter anything and can
// go to the next entry
continue;
} else{
// if we didn't find a match, create a new morpheme
$newMorphemeEntry = $morphemeXmlDoc->createElement("morpheme");
$newMorphemeSource = $morphemeXmlDoc->createElement("source");
$newMorphemeSource->nodeValue = $phraseSource;
$newMorphemeEntry->appendChild($newMorphemeSource);
$newMorphemeGloss = $morphemeXmlDoc->createElement("gloss");
$newMorphemeGloss->nodeValue = $phraseGloss;
$newMorphemeEntry->appendChild($newMorphemeGloss);
$newMorphemeRoot = $morphemeXmlDoc->createElement("root");
$newMorphemeRoot->nodeValue = $phraseSource;
$newMorphemeEntry->appendChild($newMorphemeRoot);
$newMorphemeHypernym = $morphemeXmlDoc->createElement("hypernym");
$newMorphemeHypernym->nodeValue = $phraseSource;
$newMorphemeEntry->appendChild($newMorphemeHypernym);
$newMorphemeLink = $morphemeXmlDoc->createElement("link");
$newMorphemeLink->nodeValue = "P";
$newMorphemeEntry->appendChild($newMorphemeLink);
$newMorphemeAffix = $morphemeXmlDoc->createElement("affix");
$newMorphemeAffix->nodeValue = $phraseGloss;
if(substr($phraseSource, 0, 1) == "-"){
$newMorphemeAffix->nodeValue = "suffix";
} elseif(substr($phraseSource, -1, 1) == "-"){
$newMorphemeAffix->nodeValue = "prefix";
} else{
$newMorphemeAffix->nodeValue = "root";
}
$newMorphemeEntry->appendChild($newMorphemeAffix);
$morphemeRootNode->appendChild($newMorphemeEntry);
}
}
以下是脚本正在搜索的内容,以创建新的XML表:
<phrasicon>
<phrase id="4">
<ref1>ES</ref1>
<source>t̪o: xa jo: k'ala:</source>
<morpheme>
<m id="4.1">t̪o:</m>
<m id="4.2">xa</m>
<m id="4.3">jo:</m>
<m id="4.4">k'ala:</m>
</morpheme>
<gloss lang="en">
<g id="4.1">me</g>
<g id="4.2">water</g>
<g id="4.3">for</g>
<g id="4.4">die</g>
</gloss>
<translation lang="en">I'm dying for water.</translation>
<media1 mimeType="audio/wav" url="im_dying_for_water.wav"/>
<ref2/>
<media2 mimeType="" url=""/>
<ref3/>
<media3 mimeType="" url=""/>
</phrase>
</phrasicon>
这是新的词素XML工作表应该是什么样子
<?xml version="1.0" encoding="UTF-8"?>
<morphemedatabase>
<morpheme>
<source>t̪o:</source>
<gloss>me</gloss>
<root>t̪o:</root>
<hypernym>t̪o:</hypernym>
<link>P</link>
<affix>root</affix>
</morpheme>
</morphemedatabase>
答案 0 :(得分:1)
我想象$morphemeEntries
是SimpleXMLElement
对象的固定列表,并且不会使用添加的节点进行更新。我建议使用$morphemeDB
对象进行检查。另外,您可以用Xpath表达式替换循环。
$storySource = $entry->m;
$storyGloss = $entry->g;
$foundInDB = count(
$morphemeDB->xpath(
sprintf('.//morpheme[source="%s" and gloss="%s"]', $storySource, $storyGloss)
)
) > 0;
在DOM中,DOMXpath::evaluate()
也是可能的:
$phraseSource = $xpathSource->evaluate('string(m)', $entry);
$phraseGloss = $xpathSource->evaluate('string(g)', $entry);
$foundInDB = $xpathTarget->evaluate(
sprintf(
'count(//morpheme[source="%s" and gloss="%s"]) > 0',
$storySource,
$storyGloss
)
);
在DOM实现中,您可以将createElement()
嵌套到appendChild()
中,但是您应该将内容添加为文本节点(以进行适当的转义):
$newMorphemeEntry = $morphemeRootNode->appendChild(
$morphemeXmlDoc->createElement("morpheme")
);
$newMorphemeEntry
->appendChild($morphemeXmlDoc->createElement("source"))
->appendChild($morphemeXmlDoc->createTextNode($phraseSource));
$newMorphemeEntry
->appendChild($morphemeXmlDoc->createElement("gloss"))
->appendChild($morphemeXmlDoc->createTextNode($phraseGloss));
答案 1 :(得分:0)
在比较之前不要尝试强制转换为(字符串)。
而是在每个元素上调用->asXML()
方法。
替换为:
if(((string)$source == (string)$storySource) && ((string)$gloss == (string)$storyGloss))
与此:
if(($source->asXML() == $storySource->asXML()) && ($gloss->asXML() == $storyGloss->asXML()))
或比较包含的字符串(不包括标签)
if(($source->__toString() == $storySource->__toString()) && ($gloss->__toString() == $storyGloss->__toString()))
问题是SimpleXMLElement不是“经典” PHP对象。 SimpleXML是通过“实时” API构建的,该API链接到XML文档的内部表示。
Comparing Objects上的手册页指出:“两个对象实例具有相同的属性和值,并且属于同一类,则它们是相等的。”
在SimpleXMLElement的print_r()或var_dump()中,显示为代表子节点和属性的属性。但是,实际的实现只包含一个指向在解析XML时创建的内存结构的指针,即使您两次解析相同的字符串也将有所不同。因此,仅将两个SimpleXMLElement对象与==进行比较就永远不会返回true。