当我尝试解析mysql数据库中的数据库时,Simple.pm中出现错误

时间:2018-07-17 21:40:55

标签: xml perl

我正在使用以下脚本来解析数据库中的数据库。

很少有人询问输入内容。这是一个大文件,我无法将其全部粘贴到此处,您可以检查一下此http://www.unimod.org/xml/unimod.xml 如果没有,您是否可以选择将其粘贴到可以与您共享的位置?我尝试在此处粘贴一些输入

GIST acetyl light PT and GIST acetyl light O-acetyl glyoxal-derived hydroimidazolone AA0048 RESID AA0049 RESID AA0041 RESID AA0052 RESID AA0364 RESID AA0056 RESID AA0046 RESID AA0051 RESID AA0045 RESID AA0354 RESID AA0044 RESID AA0043 RESID 11999733 PubMed PMID Chemical Reagents for Protein Modification 3rd edition, pp 215-221, Roger L. Lundblad, CRC Press, New York, N.Y., 2005 Book IonSource acetylation tutorial Misc. URL http://www.ionsource.com/Card/acetylation/acetylation.htm AA0055 RESID 14730666 PubMed PMID 15350136 PubMed PMID AA0047 RESID 12175151 PubMed PMID 11857757 PubMed PMID AA0042 RESID AA0050 RESID AA0053 RESID AA0054 RESID ACET FindMod PNAS 2006 103: 18574-18579 Journal http://dx.doi.org/10.1073/pnas.0608995103 MS/MS experiments of mass spectrometric c-ions (MS^3) can be used for protein identification by library searching. T3-sequencing is such a technique (see reference). Search engines must recognize this as a virtual modification. Top-Down sequencing c-type fragment ion AA0088 RESID AA0087 RESID AA0086 RESID AA0085 RESID AA0084 RESID AA0083 RESID AA0082 RESID AA0081 RESID AA0089 RESID AA0090 RESID AA0091 RESID AA0092 RESID AA0093 RESID AA0094 RESID AA0095 RESID AA0096 RESID AA0097 RESID AA0098 RESID AA0099 RESID AA0100 RESID AMID FindMod 14588022 PubMed PMID AA0117 RESID BIOT FindMod Carboxyamidomethylation 11510821 PubMed PMID 12422359 PubMed PMID Boja, E. S., Fales, H. M., Anal. Chem. 73 3576-82 (2001) Journal Creasy, D. M., Cottrell, J. S., Proteomics 2 1426-34 (2002) Journal 12203680 PubMed PMID Stark; Modification of proteins with cyanate. Meth Enz 25B, 579-584 (1972) Journal AA0343 RESID 10978403 PubMed PMID AA0332 RESID Smyth; Carbamylation of amino and tyrosine hydroxyl groups. J Biol Chem 242, 1579-1591 (1967) Journal IonSource carbamylation tutorial Misc. URL http://www.ionsource.com/Card/carbam/carbam.htm Carbamylation is an irreversible process of non-enzymatic modification of proteins by the breakdown products of urea isocyanic acid reacts with the N-term of a proteine or side chains of lysine and arginine residues Hydroxylethanone Carboxymethylation Protein which is post-translationally modified by the de-imination of one or more arginine residues; Peptidylarginine deiminase (PAD) converts protein bound to citrulline Convertion of glycosylated asparagine residues upon deglycosylation with PNGase F in H2O phenyllactyl from N-term Phe Citrullination FLAC FindMod AA0128 RESID CITR FindMod IonSource

我收到此错误

  

第13行第3列的不匹配标记,/ srv / myscr / script /../ extern / cpan / lib / perl5 / XML / Simple.pm第391行的字节569

我用来解析数据的代码如下,如果可以告诉我为什么会收到这样的错误以及如何修复它,我将不胜感激。

添加代码后,出现以下错误

Fetching unimod.xml from unimod web site
Connecting to pipeline database
Emptying modifications table
Parsing XML
mismatched tag at line 13, column 3, byte 569 at /srv/myscr/script/../extern/cpan/lib/perl5/XML/Simple.pm line 39

1 个答案:

答案 0 :(得分:2)

为便于将来参考,这是您的代码的精简版,足以说明问题。这就是您本应向我们显示的原始问题的一部分。

use strict;
use warnings;

use XML::Simple;
use LWP::UserAgent;

print "Fetching unimod.xml from unimod web site\n";

# Retrieve latest xml version of Unimod from the website
my $ua = LWP::UserAgent->new();
$ua->env_proxy();
my $response = $ua->get( "http://www.unimod.org/xml/unimod.xml" );

my $xml = $response->content;

print "Parsing XML\n";

# Use XML::Simple DOM parser - Okay as unimod.xml is small
# Force specificity and neutral losses into an array to simplify code
my $xs = new XML::Simple(
    KeyAttr    => { "umod:mod" => "+title" },
    ForceArray => [ "umod:specificity", "umod:NeutralLoss" ]
);
my $ref = $xs->XMLin( $xml );

看看我如何消除有关配置文件或更新数据库的所有干扰。它只是从网站上获取XML并进行解析。

坏消息是,对我来说,这很好。它解析XML而不会引发任何错误。作为参考,我使用XML :: Simple版本2.25和Perl 5.26.2。

了解该程序在运行时是否与原始代码产生相同的错误会很有用。

如评论中所述,查看您实际上从网站获得的XML也会很有趣。您可以通过使用$xml变量并将其内容写入文件来实现:

open my $xml_fh, '>', 'test.xml' or die $!;
print $xml_fh $xml;

然后,运行代码后,您将有一个名为test.xml的文件,其中包含网站提供的XML。您可以检查该文件的第13行,以确定错误是什么。

出于它的价值,我怀疑您由于某种原因没有收回XML。我怀疑您的网络上的代理服务器或网站本身正在阻止您尝试自动提取数据,并向您返回404或503 HTML页面。不过,这只是一个猜测,除非您运行我上面建议的测试,否则我们无法确定。