我有一个SimpleXML对象,它是通过合并来自PubMed的多个XML(下面的代码片段)制作的,但是合并中有重复。如何比较所有第一个子数组 - array [] [0],array [] [1]等 - 并丢弃任何重复项? 我虽然序列化可能是答案,但你不能将SimpleXML对象序列化。
我不知道从哪里开始?
Array
(
[0] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
[1] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
或者可以在初始XML合并阶段完成 - 如果有人可以建议如何修改它以删除重复项,我现在使用下面的代码?
function simplexml_merge (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) {
$dom1 = new DomDocument();
$dom2 = new DomDocument();
$dom1->loadXML($xml1->asXML());
$dom2->loadXML($xml2->asXML());
$xpath = new domXPath($dom2);
$xpathQuery = $xpath->query('/*/*');
for ($i = 0; $i < $xpathQuery->length; $i++) {
$dom1->documentElement->appendChild(
$dom1->importNode($xpathQuery->item($i), true));
}
$xml1 = simplexml_import_dom($dom1);
}
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
simplexml_merge($xml1, $xml2);
感谢。
... ...
为了清楚起见 - 这是我导入SimpleXML的XML源布局 - 每个PubmedArticle都是一个“元素”,我有兴趣比较并确保没有重复 -
<xml...>
<Document>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
etc
</Document>
</xml>
PMID节点是唯一的,因此可用于检查重复项。
... ...
使用@Gordon的链接 - 我知道使用:
//Get my source XML
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
//Run through $xml1 and build a query based on it's PMIDs
$query = array();
foreach ($xml1->PubmedArticle as $paper) {
$query[] = sprintf('(PMID != %s)',$paper->MedlineCitation->PMID);
}
$query = implode('and', $query);
//Run through $xml2 and get node which don't have PMID matching $xml1
foreach ($xml2->xpath(sprintf('PubmedArticle/MedlineCitation[%s]', $query)) as $paper) {
echo $paper->asXml();
}
但是我仍有一个问题 - 合并输出。
$xml2
的输出缺少每个'匹配'周围的<PubmedArticle>
节点。然后我假设我可以使用相同的合并代码(上面)来进行合并。
你能指出我正确的方向吗?
答案 0 :(得分:1)
将它转换为数组(我不会为你编写,只需迭代并添加。),然后array_diff()
。
答案 1 :(得分:0)
决定关注@ Gordon的行,因为它保留了XML。最终完成了所有工作:
//function to check 2 xml inputs for duplicate nodes
function dedupeXML($xml1, $xml2) {
$query = array();
foreach ($xml1->PubmedArticle as $paper) {
$query[] = sprintf('(MedlineCitation/PMID != %s)',$paper->MedlineCitation->PMID);
}
$query = implode('and', $query);
$xmlClean = '<Document>';
foreach ($xml2->xpath(sprintf('PubmedArticle[%s]', $query)) as $paper) {
$xmlClean .= $paper->asXML();
}
$xmlClean .= '</Document>';
$xmlClean = new SimpleXMLElement($xmlClean);
return $xmlClean;
}
//function to merge 2 xml inputs
function mergeXML (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) {
// convert SimpleXML objects into DOM ones
$dom1 = new DomDocument();
$dom2 = new DomDocument();
$dom1->loadXML($xml1->asXML());
$dom2->loadXML($xml2->asXML());
// pull all child elements of second XML
$xpath = new domXPath($dom2);
$xpathQuery = $xpath->query('/*/*');
for ($i = 0; $i < $xpathQuery->length; $i++) {
// and pump them into first one
$dom1->documentElement->appendChild(
$dom1->importNode($xpathQuery->item($i), true));
}
$xml = simplexml_import_dom($dom1);
return $xml;
}
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
$xml3 = new SimpleXMLElement($search3);
//dedupe and merge inputs
//input 1 & 2
$xml2Clean = dedupeXML($xml1, $xml2);
$xml12 = mergeXML($xml1, $xml2Clean);
//input 1+2 & 3
$xml3Clean = dedupeXML($xml12, $xml3);
$xml123 = mergeXML($xml12, $xml3Clean);
这很容易适应其他数据源 - 只需修改dedupeXML
函数以匹配XML的数据结构。