我想从xml文件中提取数据并将它们导入MariaDB / MySQL数据库。 xml文件是:
<?xml version="1.0" encoding="UTF-8"?>
<database>
<row1s>
<row1 name="fox" category="mammal">
<row2s>
<row2 type="1" size="10"/>
<row2 type="2" size="8"/>
</row2s>
</row1>
<row1 name="horse" category="mammal">
<row2s>
<row2 type="3" size="100"/>
</row2s>
</row1>
<row1 name="bee" category="insect">
<row2s/>
</row1>
<row1 name="wasp" category="insect">
<row2s/>
</row1>
</row1s>
</database>
和Perl代码是:
use strict;
use warnings;
use DBI;
use XML::XPath;
use XML::XPath::XMLParser;
my $xp = XML::XPath->new( filename => "animals4.xml" );
# my $xp = XML::XPath->new( ioref => \*DATA );
my $dbh = DBI->connect( "DBI:mysql:test", "user", "pw", { RaiseError => 1, PrintError => 0 } )
or die "Fehler beim Verbidungsaufbau zum MariaDB-Server:" . " $DBI::err -< $DBI::errstr \n";
for my $row1 ( $xp->findnodes('//row1s/row1') ) {
printf "Level --- row1 \"name\" gives: %s\n", $row1->getAttribute("name");
for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
printf "Level row2 \"type\" gives: %s\n", $row2->getAttribute("type");
printf "Level row2 \"size\" gives: %s\n", $row2->getAttribute("size");
$dbh->do(
"INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)",
undef,
$row1->getAttribute("name"),
$row1->getAttribute("category"),
$row2->getAttribute("type"),
$row2->getAttribute("size")
) or die "Error during execution: " . "$DBI::err -> $DBI::errstr (animal $DBI::state)\n";
}
}
终端输出为:
Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp
这是我的预期。但该表格包含以下条目:
name category type size
fox mammal 1 10
fox mammal 2 8
horse mammal 3 100
蜜蜂和黄蜂错过了。任何人都可以帮我解决这个问题吗?我想知道为什么这会发生,因为终端的输出是好的。
感谢您的帮助。
以下是表格的代码:
CREATE TABLE test01.animal4 (
name VARCHAR(50) DEFAULT NULL
, category VARCHAR(50) DEFAULT NULL
, type INTEGER DEFAULT NULL
, size INTEGER DEFAULT NULL
);
这是hierarchy problem的后续问题。
答案 0 :(得分:2)
您已经有解释和修复,但我建议进行以下更改
您应该prepare
INSERT INTO
SQL语句,然后在循环中execute
。 do
有更大的开销
//
(descendant-or-self::node()
)XPath结构很昂贵,如果你不知道元素在文档中的位置,你应该保留它,这是非常罕见的。在这种情况下,row1
元素位于/database/row1s/row1
,row2
元素相对于row2s/row2
如果要在带引号的字符串中使用引号字符,则使用不同的分隔符会更清晰。例如,"My name is \"$name\""
比qq{My name is "$name"}
这是您的计划版本,可能有所帮助。
use strict;
use warnings;
use XML::XPath;
use DBI;
my $xp = XML::XPath->new( filename => 'animals4.xml' );
my $dbh = DBI->connect(
'DBI:mysql:test', 'user', 'pw',
{ RaiseError => 1, PrintError => 0}
) or die "Fehler beim Verbidungsaufbau zum MariaDB-Server: $DBI::err -< $DBI::errstr\n";
my $insert_animal = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');
for my $row1 ( $xp->findnodes('/database/row1s/row1') ) {
my $name = $row1->getAttribute('name');
my $category = $row1->getAttribute('category');
printf qq{Level --- row1 "name" gives: $name\n};
my @row2 = $xp->findnodes('row2s/row2', $row1);
if ( @row2 ) {
for my $row2 ( @row2 ) {
my $type = $row2->getAttribute('type');
my $size = $row2->getAttribute('size');
print qq{Level row2 "type" gives: $type\n};
print qq{Level row2 "size" gives: $size\n};
$insert_animal->execute($name, $category, $type, $size);
}
}
else {
$insert_animal->execute($name, $category, undef, undef);
}
}
<强>输出强>
Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp
答案 1 :(得分:1)
从您的代码中,只有当您的第二个查询(对于$ row1下的节点)返回结果时,才会发生数据库写入:
for my $row1 ( $xp->findnodes('//row1s/row1') ){
for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
$dbh->do("INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)"
[...]
) or die ;
}
}
如果没有$ row2节点,则没有数据库写入。
如果您希望无论$ row2节点是否存在都要进行数据库写入,您需要将db写出来自for循环,即:
for my $row1 ( $xp->findnodes('//row1s/row1') ){
# get name and category here
my $name = $row1->getAttribute('name');
my $cat = $row1->getAttribute('category');
my $row2set = $row1->find('row2s/row2'); ## creates a Nodeset object
if ($row2set->size > 0) {
## we found nodes!!
foreach my $row2 ($row2set->get_nodelist) {
# get size and type here
my $type = $row2->getAttribute('type');
my $size = $row2->getAttribute('size');
# write to db
}
} else {
## no row2 nodes found.
## write to db - just write the row1 values; type and size will be undefined.
}
}
NodeSet文档:http://search.cpan.org/~msergeant/XML-XPath-1.13/XPath/NodeSet.pm
关于设置变量和范围的快速说明
范围指的是实体(变量,子例程,对象等)在Perl代码中可见和可访问的位置;设置实体的范围有助于封装它们,并防止数据或功能在程序的每个部分都可用。
使用代码结构(如子例程,循环,包,对象)设置范围 - 任何由花括号({
和}
分隔的代码块。 Perl(以及许多其他语言)的标准做法是在进入块时增加缩进并在离开块时减少缩进;这样,您可以在阅读代码时非常轻松地确定范围。
使用my
将变量(或函数,对象等)的范围设置为仅限于设置变量的代码块; e.g。
for my $row1 ( $xp->findnodes('//row1s/row1') ){
# $row1 is available inside this code block
my $row2set = $row1->find('row2s/row2');
# $row2set is now available inside this code block
if ($row2set->size > 0) {
my $size = $row2set->size;
# $size is now available inside this code block
foreach my $row2 ($row2set->get_nodelist) {
# $row2 is available inside this code block
# we can also access $row1, $row2set, $size
}
# we can access $row1, $row2set, $size
# $row2 is out of scope, i.e. we cannot access it
say "The value of row2 is $row2";
# Perl will complain 'Global symbol "$row2" requires explicit package name'
}
# we can access $row1 and $row2set
# $size and $row2 are out of scope
}
# $row1, $row2set, $size, and $row2 are out of scope
回到您的代码,假设您决定设置变量$name
,$category
,$type
和$size
来捕获您的数据并将其写入数据库。您必须确保正确设置变量的范围,否则它们将存储不适当的数据。例如:
# declare all our variables
my ($name, $cat, $type, $size);
for my $row1 ( $xp->findnodes('//row1s/row1') ){
# we can set $name and $cat from the data in row1:
$name = $row1->getAttribute('name');
$cat = $row1->getAttribute('category');
my $row2set = $row1->find('row2s/row2');
if ($row2set->size > 0) {
foreach my $row2 ($row2set->get_nodelist) {
# row2 gives us the type and size info
$type = $row2->getAttribute('type');
$size = $row2->getAttribute('size');
# "say" prints a string and adds a "\n" to the end,
# so it's very handy for debugging
say "row2s found: name: $name; category: $cat; type: $type; size: $size";
}
} else {
say "row2s empty: name: $name; category: $cat; type: $type; size: $size";
}
}
这给了我们以下输出:
row2s found: name: fox; category: mammal; type: 1; size: 10
row2s found: name: fox; category: mammal; type: 2; size: 8
row2s found: name: horse; category: mammal; type: 3; size: 100
row2s empty: name: bee; category: insect; type: 3; size: 100
row2s empty: name: wasp; category: insect; type: 3; size: 100
这是因为$type
和$size
的范围设置为整个代码块,并且在row1循环和内部row2循环的每次迭代之间保留值。蜜蜂和黄蜂没有大小和类型的值,因此使用前一种动物的值。
有许多不同的方法可以解决这个问题,但效率最高的可能是:
my $db_insert = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');
for my $row1 ( $xp->findnodes('//row1s/row1') ){
my $row2set = $row1->find('row2s/row2');
if ($row2set->size > 0) {
foreach my $row2 ($row2set->get_nodelist) {
# for debugging
say "row2s found: name: " . $row1->getAttribute('name') .
"; category: " . $row1->getAttribute('category') .
"; type: " . $row2->getAttribute('type') .
"; size: " . $row2->getAttribute('size');
$db_insert->execute( $row1->getAttribute('name'),
$row1->getAttribute('category'),
$row2->getAttribute('type'),
$row2->getAttribute('size') );
}
} else {
# for debugging
say "row2s empty: name: " . $row1->getAttribute('name') .
"; category: " . $row1->getAttribute('category') .
"; type: NOT SET" .
"; size: NOT SET";
$db_insert->execute( $row1->getAttribute('name'),
$row1->getAttribute('category'),
undef,
undef );
}
}