Question

我想从xml文件中提取数据并将它们导入MariaDB / MySQL数据库。 xml文件是：

<?xml version="1.0" encoding="UTF-8"?>
<database>
  <row1s>
    <row1 name="fox" category="mammal">
       <row2s>
         <row2 type="1" size="10"/>
         <row2 type="2" size="8"/>
       </row2s>
       </row1>
    <row1 name="horse" category="mammal">
       <row2s>
             <row2 type="3" size="100"/>
       </row2s>
    </row1>
    <row1 name="bee" category="insect"> 
       <row2s/>
    </row1>
    <row1 name="wasp" category="insect">
       <row2s/>
    </row1>
  </row1s>
</database>

和Perl代码是：

use strict;
use warnings;
use DBI;

use XML::XPath;
use XML::XPath::XMLParser;

my $xp = XML::XPath->new( filename => "animals4.xml" );
# my $xp = XML::XPath->new( ioref => \*DATA );

my $dbh = DBI->connect( "DBI:mysql:test", "user", "pw", { RaiseError => 1, PrintError => 0 } )
    or die "Fehler beim Verbidungsaufbau zum MariaDB-Server:" . " $DBI::err -< $DBI::errstr \n";

for my $row1 ( $xp->findnodes('//row1s/row1') ) {
    printf "Level --- row1 \"name\" gives: %s\n", $row1->getAttribute("name");

    for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
        printf "Level row2 \"type\" gives: %s\n", $row2->getAttribute("type");
        printf "Level row2 \"size\" gives: %s\n", $row2->getAttribute("size");

        $dbh->do(
            "INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)",
            undef,
            $row1->getAttribute("name"),
            $row1->getAttribute("category"),
            $row2->getAttribute("type"),
            $row2->getAttribute("size")
        ) or die "Error during execution: " . "$DBI::err -> $DBI::errstr (animal $DBI::state)\n";
    }
}

终端输出为：

Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp

这是我的预期。但该表格包含以下条目：

name  category  type    size
fox   mammal    1         10
fox   mammal    2          8
horse mammal    3        100

蜜蜂和黄蜂错过了。任何人都可以帮我解决这个问题吗？我想知道为什么这会发生，因为终端的输出是好的。

感谢您的帮助。

以下是表格的代码：

CREATE TABLE test01.animal4 (
name VARCHAR(50) DEFAULT NULL
, category VARCHAR(50) DEFAULT NULL
, type     INTEGER DEFAULT NULL
, size     INTEGER DEFAULT NULL
);

这是hierarchy problem的后续问题。

Answer 1

您已经有解释和修复，但我建议进行以下更改

您应该prepare INSERT INTO SQL语句，然后在循环中execute。 do有更大的开销
//（descendant-or-self::node()）XPath结构很昂贵，如果你不知道元素在文档中的位置，你应该保留它，这是非常罕见的。在这种情况下，row1元素位于/database/row1s/row1，row2元素相对于row2s/row2
如果要在带引号的字符串中使用引号字符，则使用不同的分隔符会更清晰。例如，"My name is \"$name\""比qq{My name is "$name"}

这是您的计划版本，可能有所帮助。

use strict;
use warnings;

use XML::XPath;
use DBI;

my $xp = XML::XPath->new( filename => 'animals4.xml' );

my $dbh = DBI->connect(
   'DBI:mysql:test', 'user', 'pw',
   { RaiseError => 1, PrintError => 0}
) or die "Fehler beim Verbidungsaufbau zum MariaDB-Server: $DBI::err -< $DBI::errstr\n";

my $insert_animal = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');

for my $row1 ( $xp->findnodes('/database/row1s/row1') ) {

   my $name     = $row1->getAttribute('name');
   my $category = $row1->getAttribute('category');

   printf qq{Level --- row1 "name" gives: $name\n};

   my @row2 = $xp->findnodes('row2s/row2', $row1);

   if ( @row2 ) {
      for my $row2 ( @row2 ) {

         my $type = $row2->getAttribute('type');
         my $size = $row2->getAttribute('size');

         print qq{Level row2 "type" gives: $type\n};
         print qq{Level row2 "size" gives: $size\n};

         $insert_animal->execute($name, $category, $type, $size);
      }
   }
   else {
      $insert_animal->execute($name, $category, undef, undef);
   }
}

<强>输出

Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp

Answer 2

从您的代码中，只有当您的第二个查询（对于$ row1下的节点）返回结果时，才会发生数据库写入：

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
        $dbh->do("INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)"
        [...]  
        ) or die        ;   
    }
}

如果没有$ row2节点，则没有数据库写入。

如果您希望无论$ row2节点是否存在都要进行数据库写入，您需要将db写出来自for循环，即：

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # get name and category here
    my $name = $row1->getAttribute('name');
    my $cat = $row1->getAttribute('category');
    my $row2set = $row1->find('row2s/row2'); ## creates a Nodeset object
    if ($row2set->size > 0) {
        ## we found nodes!!
        foreach my $row2 ($row2set->get_nodelist) {
           # get size and type here
           my $type = $row2->getAttribute('type');
           my $size = $row2->getAttribute('size');
           # write to db

        }
    } else {
        ## no row2 nodes found.
        ## write to db - just write the row1 values; type and size will be undefined.

    }
}

NodeSet文档：http://search.cpan.org/~msergeant/XML-XPath-1.13/XPath/NodeSet.pm

关于设置变量和范围的快速说明

范围指的是实体（变量，子例程，对象等）在Perl代码中可见和可访问的位置;设置实体的范围有助于封装它们，并防止数据或功能在程序的每个部分都可用。

使用代码结构（如子例程，循环，包，对象）设置范围 - 任何由花括号（{和}分隔的代码块。 Perl（以及许多其他语言）的标准做法是在进入块时增加缩进并在离开块时减少缩进;这样，您可以在阅读代码时非常轻松地确定范围。

使用my将变量（或函数，对象等）的范围设置为仅限于设置变量的代码块; e.g。

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # $row1 is available inside this code block

    my $row2set = $row1->find('row2s/row2');
    # $row2set is now available inside this code block

    if ($row2set->size > 0) {
        my $size = $row2set->size;
        # $size is now available inside this code block

        foreach my $row2 ($row2set->get_nodelist) {
            # $row2 is available inside this code block
            # we can also access $row1, $row2set, $size
        }

        # we can access $row1, $row2set, $size
        # $row2 is out of scope, i.e. we cannot access it

        say "The value of row2 is $row2";
        # Perl will complain 'Global symbol "$row2" requires explicit package name'
    }
    # we can access $row1 and $row2set
    # $size and $row2 are out of scope
}
# $row1, $row2set, $size, and $row2 are out of scope

回到您的代码，假设您决定设置变量$name，$category，$type和$size来捕获您的数据并将其写入数据库。您必须确保正确设置变量的范围，否则它们将存储不适当的数据。例如：

# declare all our variables
my ($name, $cat, $type, $size);
for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # we can set $name and $cat from the data in row1:
    $name = $row1->getAttribute('name');
    $cat = $row1->getAttribute('category');
    my $row2set = $row1->find('row2s/row2');
    if ($row2set->size > 0) {
        foreach my $row2 ($row2set->get_nodelist) {
            # row2 gives us the type and size info
            $type = $row2->getAttribute('type');
            $size = $row2->getAttribute('size');
            # "say" prints a string and adds a "\n" to the end,
            # so it's very handy for debugging
            say "row2s found: name: $name; category: $cat; type: $type; size: $size";
        }
    } else {
        say "row2s empty: name: $name; category: $cat; type: $type; size: $size";
    }
}

这给了我们以下输出：

row2s found: name: fox; category: mammal; type: 1; size: 10
row2s found: name: fox; category: mammal; type: 2; size: 8
row2s found: name: horse; category: mammal; type: 3; size: 100
row2s empty: name: bee; category: insect; type: 3; size: 100
row2s empty: name: wasp; category: insect; type: 3; size: 100

这是因为$type和$size的范围设置为整个代码块，并且在row1循环和内部row2循环的每次迭代之间保留值。蜜蜂和黄蜂没有大小和类型的值，因此使用前一种动物的值。

有许多不同的方法可以解决这个问题，但效率最高的可能是：

my $db_insert = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    my $row2set = $row1->find('row2s/row2');
    if ($row2set->size > 0) {
        foreach my $row2 ($row2set->get_nodelist) {
            # for debugging
            say "row2s found: name: " . $row1->getAttribute('name') .
            "; category: " . $row1->getAttribute('category') .
            "; type: " . $row2->getAttribute('type') .
            "; size: " . $row2->getAttribute('size');

            $db_insert->execute( $row1->getAttribute('name'),
            $row1->getAttribute('category'),
            $row2->getAttribute('type'),
            $row2->getAttribute('size') );
        }
    } else {
        # for debugging
        say "row2s empty: name: " . $row1->getAttribute('name') .
        "; category: " . $row1->getAttribute('category') .
        "; type: NOT SET" .
        "; size: NOT SET";
        $db_insert->execute( $row1->getAttribute('name'),
        $row1->getAttribute('category'),
        undef,
        undef );
    }
}

Perl和XPath：数据库表中缺少条目

2 个答案: