我正在尝试从PubMed的Eutils生成的XML输出中构建一个ID数组。
Here is the code on GitHub。以下是具体的子程序。
最好的方法是什么?
getUID($query);
sub getUID {
# First, build the Eutils query
my $utils = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils'; # Base URL for searches
my $db = 'pubmed'; # Default to PubMed database; this may be changed.
my $retmax = 10; # Get 10 results from Eutils
my $esearch = $utils . '/esearch.fcgi?db=' . $db . '&retmax=' . $retmax . '&term=';
my $esearch_result = get( $esearch . $query ); # Downloads the XML
# Second, extract the UIDs
$esearch_result =~ m(<Id>*</Id>);
print $esearch_result; # This should return a list of ID's (numbers), but doesn't.
}
这是PubMed XML结果的样子:
<?xml version="1.0" ?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">
<eSearchResult><Count>2768671</Count><RetMax>10</RetMax><RetStart>0</RetStart><IdList>
<Id>23682407</Id>
<Id>23682406</Id>
<Id>23682388</Id>
<Id>23682359</Id>
<Id>23682336</Id>
<Id>23682331</Id>
<Id>23682325</Id>
<Id>23682320</Id>
<Id>23682315</Id>
<Id>23682311</Id>
</IdList><TranslationSet><Translation> <From>cancer</From> <To>"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]</To> </Translation></TranslationSet><TranslationStack> <TermSet> <Term>"neoplasms"[MeSH Terms]</Term> <Field>MeSH Terms</Field> <Count>2430901</Count> <Explode>Y</Explode> </TermSet> <TermSet> <Term>"neoplasms"[All Fields]</Term> <Field>All Fields</Field> <Count>1920766</Count> <Explode>Y</Explode> </TermSet> <OP>OR</OP> <TermSet> <Term>"cancer"[All Fields]</Term> <Field>All Fields</Field> <Count>1192293</Count> <Explode>Y</Explode> </TermSet> <OP>OR</OP> <OP>GROUP</OP> </TranslationStack><QueryTranslation>"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]</QueryTranslation></eSearchResult>
答案 0 :(得分:2)
如果希望匹配返回字符串,则必须添加捕获括号。如果有多个匹配项,请使用g
选项。将结果存储在数组中:
my @matches = $esearch_result =~ m(<Id>(.*)</Id>)g;
print "$_\n" for @matches;
答案 1 :(得分:0)
您可能有理由想要以这种方式手动使用eutils,但我想至少让您知道有更简单的方法。对于这些任务,我使用BioPerl中的Bio::DB::EUtilities模块,因为它使这类事情变得更加容易并节省时间(EUtilities Cookbook中有一节显示了PubMed提供的信息)。此外,还有最近更新的Bio::Biblio模块,其中包含许多访问PubMed记录的方法。