来自sub的来电者给了我一个值- (void)test {
NSMutableArray *arrayOfParams = [NSMutableArray array];
for(int i=1; i<5; i++){
NSString *category = [NSString stringWithFormat:@"%d",i];
NSString *encrypt = @"encrypt=93mrLIMApU1lNM619WzZje4S9EeI4L2L";
NSString *latitude = @"latitude=32.794044";
NSString *longtitude = @"longitude=34.989571";
NSString *params = [NSString stringWithFormat:@"%@&category=%@&%@&%@&area=CENTER",
encrypt,category,latitude,longtitude];
[arrayOfParams addObject:params];
}
NSMutableArray *result = [NSMutableArray array];
[self makeManyRequestsWithParams:arrayOfParams fillingArray:result completion:^(BOOL success) {
if (success) {
NSLog(@"all done, result is %@", result);
} else {
NSLog(@"sadness");
}
}];
// don't expect results to be ready here. they won't be.
// see how they are logged above in the completion block?
}
。我从MySQL数据库中选择了一个值到标量$new_value
。我无法弄清楚如何可靠地检测他们是否相同&#34;。相同的意思是:如果用$current_value
更新数据库记录会改变数据库状态吗?
将其归结为其本质:
$new_value
如果我使用#!/usr/bin/perl -w
use utf8;
use strict;
use Encode qw(encode);
my $str = 'æøå';
my $latin1 = encode('latin1', $str);
# This in fact doesn't die. They're eq
$str eq $latin1
or die;
更新MySQL数据库中的字段,如果我重新选择它,我会得到一个值 - 一个UTF-8编码值。使用$str
,数据库字段以另一个值结束 - 一个latin1 / ISO-8859-1编码值。
原始问题我正在调试使用$latin1
更新某个字段,但症状同样很简单:
CHARSET=latin1
由于my $dbh = DBI->connect(
"DBI:mysql:mysql",
'user',
'pass',
# No, we don't have these options on our DB handles
# Introducing them now would causes (too) many regression issues
# for us, as in other places also, values are latin1 encoded,
# not UTF-8 encoded.
# mysql_enable_utf8 => 1,
# mysql_enable_utf8mb4 => 1`
);
my $sth = $dbh->prepare('SELECT CONCAT(?)')
or die;
$sth->execute($val);
my ($return_val) = $sth->fetchrow_array();
和$str
在MySQL往返之后会产生不同的值,因此我想检测它们实际上是不相等的。因此,假设数据库中的当前值是正确编码的latin1 $latin1
,我已经æøå
- 将其编入SELECT
标量,那么我的问题归结为编码:< / p>
$current_value
我如何做到这一点?我能够检测到的唯一区别是,sub new_value_will_change_database {
my ($current_value, $new_value) = @_;
# How to write I write this sub, so it returns true for $str
# and false for $latin1 from above?
...
}
上启用了UTF8标记,但$str
上没有启用。但是,我似乎还记得,如果我检查UTF-8标志,我的代码就会被破坏......
$latin1
生成此输出:
#!/usr/bin/perl -w
use utf8;
use strict;
use feature qw(:5.10);
use Encode qw(encode is_utf8);
use DBI;
use Data::Dumper;
my $str = 'æøå';
my $latin1 = encode('latin1', $str);
my $utf8upgraded = $latin1;
utf8::upgrade($utf8upgraded);
# $str, $latin1 and $utf8upgraded are all eq each other:
$str eq $latin1
or die;
$str eq $utf8upgraded
or die;
$latin1 eq $utf8upgraded
or die;
my $dbh = DBI->connect(
"DBI:mysql:mysql",
'user',
'pass',
);
my $sth = $dbh->prepare('SELECT CONCAT(?)')
or die;
sub mysql_roundtrip {
my ($val) = @_;
$sth->execute($val);
my ($concat) = $sth->fetchrow_array();
return $concat;
};
foreach my $set (
[ 'str', $str ],
[ 'latin', $latin1 ],
[ 'utf8upgraded', $utf8upgraded ],
) {
my ($disp, $val) = @$set;
my $hex = $val;
$hex =~ s/(.)/sprintf "%X", ord($1)/ge;
my $dumper = Data::Dumper->new([substr $val, 0, 1])->Terse(1)->Dump;
chomp $dumper;
printf "%-13s: val:%s mysql:%s is_utf8:%d hex:%s dumper0:%s\n",
$disp,
$val,
mysql_roundtrip($val),
is_utf8($val),
$hex,
$dumper;
}
答案 0 :(得分:3)
当您尝试将字符放在iso-latin-1字段中并且该字符不在Windows-1252字符集中时,会插入问号。
因此,假设您正确地将文本发送到数据库 [1] ,以下内容将起作用:
sub will_change_db_virtual {
my ($current_text, $new_text) = @_;
state $re;
if (!$re) {
my $cp1252_charset = decode('cp1252', (join '', map chr, 0x00..0xFF), sub { "" });
$re = qr/[^\Q$cp1252_charset\E]/;
}
$new_text =~ s/$re/?/g;
return $new_text ne $current_text;
}
测试:
#!/usr/bin/perl
use utf8; # Source code encoded using UTF-8.
use open ':std', ':encoding(UTF-8)'; # Terminal uses UTF-8.
use strict;
use warnings;
use 5.010;
use DBI;
use Encode qw( decode encode encode_utf8 );
sub mysql_roundtrip {
my ($val) = @_;
my $dbh = DBI->connect(
'dbi:mysql:...', '...', '...',
{
PrintError => 1,
RaiseError => 1,
mysql_enable_utf8 => 1, # Decodes string received from the DB.
mysql_enable_utf8mb4 => 1, # Sets the encoding used for the connection.
},
);
my $got = $dbh->selectrow_array(
'SELECT CONVERT(? USING LATIN1)',
undef,
encode_utf8($val),
);
return $got;
}
sub will_change_db_real {
my ($current_text, $new_text) = @_;
return mysql_roundtrip($new_text) ne $current_text;
}
sub will_change_db_virtual {
my ($current_text, $new_text) = @_;
state $re;
if (!$re) {
my $cp1252_charset = decode('cp1252', (join '', map chr, 0x00..0xFF), sub { "" });
$re = qr/[^\Q$cp1252_charset\E]/;
}
$new_text =~ s/$re/?/g;
return $new_text ne $current_text;
}
my @tests = (
[ "abcd\x{000E9}fg", "abcd\x{000E9}fg" ],
[ "abcd\x{00113}fg", "abcd\x{00113}fg" ],
[ "abcd?fg", "abcd\x{00113}fg" ],
[ ( decode('cp1252', (join '', map chr, 0x00..0xFF), sub { "" }) ) x 2 ],
);
for (@tests) {
my ($current_text, $new_text) = @$_;
my $got_real = will_change_db_real($current_text, $new_text);
my $got_virtual = will_change_db_virtual($current_text, $new_text);
printf("current:%vX new:%vX changed? real:%d virtual:%d result:%s\n",
$current_text,
$new_text,
$got_real ? 1 : 0,
$got_virtual ? 1 : 0,
($got_real ? 1 : 0) ^ ($got_virtual ? 1 : 0) ? "fail" : "pass"
);
}
测试输出:
current:61.62.63.64.E9.66.67 new:61.62.63.64.E9.66.67 changed? real:0 virtual:0 result:pass
current:61.62.63.64.113.66.67 new:61.62.63.64.113.66.67 changed? real:1 virtual:1 result:pass
current:61.62.63.64.3F.66.67 new:61.62.63.64.113.66.67 changed? real:0 virtual:0 result:pass
current:0.1.2.3.4.5.6.7.8.9.A.B.C.D.E.F.10.11.12.13.14.15.16.17.18.19.1A.1B.1C.1D.1E.1F.20.21.22.23.24.25.26.27.28.29.2A.2B.2C.2D.2E.2F.30.31.32.33.34.35.36.37.38.39.3A.3B.3C.3D.3E.3F.40.41.42.43.44.45.46.47.48.49.4A.4B.4C.4D.4E.4F.50.51.52.53.54.55.56.57.58.59.5A.5B.5C.5D.5E.5F.60.61.62.63.64.65.66.67.68.69.6A.6B.6C.6D.6E.6F.70.71.72.73.74.75.76.77.78.79.7A.7B.7C.7D.7E.7F.20AC.FFFD.201A.192.201E.2026.2020.2021.2C6.2030.160.2039.152.FFFD.17D.FFFD.FFFD.2018.2019.201C.201D.2022.2013.2014.2DC.2122.161.203A.153.FFFD.17E.178.A0.A1.A2.A3.A4.A5.A6.A7.A8.A9.AA.AB.AC.AD.AE.AF.B0.B1.B2.B3.B4.B5.B6.B7.B8.B9.BA.BB.BC.BD.BE.BF.C0.C1.C2.C3.C4.C5.C6.C7.C8.C9.CA.CB.CC.CD.CE.CF.D0.D1.D2.D3.D4.D5.D6.D7.D8.D9.DA.DB.DC.DD.DE.DF.E0.E1.E2.E3.E4.E5.E6.E7.E8.E9.EA.EB.EC.ED.EE.EF.F0.F1.F2.F3.F4.F5.F6.F7.F8.F9.FA.FB.FC.FD.FE.FF new:0.1.2.3.4.5.6.7.8.9.A.B.C.D.E.F.10.11.12.13.14.15.16.17.18.19.1A.1B.1C.1D.1E.1F.20.21.22.23.24.25.26.27.28.29.2A.2B.2C.2D.2E.2F.30.31.32.33.34.35.36.37.38.39.3A.3B.3C.3D.3E.3F.40.41.42.43.44.45.46.47.48.49.4A.4B.4C.4D.4E.4F.50.51.52.53.54.55.56.57.58.59.5A.5B.5C.5D.5E.5F.60.61.62.63.64.65.66.67.68.69.6A.6B.6C.6D.6E.6F.70.71.72.73.74.75.76.77.78.79.7A.7B.7C.7D.7E.7F.20AC.FFFD.201A.192.201E.2026.2020.2021.2C6.2030.160.2039.152.FFFD.17D.FFFD.FFFD.2018.2019.201C.201D.2022.2013.2014.2DC.2122.161.203A.153.FFFD.17E.178.A0.A1.A2.A3.A4.A5.A6.A7.A8.A9.AA.AB.AC.AD.AE.AF.B0.B1.B2.B3.B4.B5.B6.B7.B8.B9.BA.BB.BC.BD.BE.BF.C0.C1.C2.C3.C4.C5.C6.C7.C8.C9.CA.CB.CC.CD.CE.CF.D0.D1.D2.D3.D4.D5.D6.D7.D8.D9.DA.DB.DC.DD.DE.DF.E0.E1.E2.E3.E4.E5.E6.E7.E8.E9.EA.EB.EC.ED.EE.EF.F0.F1.F2.F3.F4.F5.F6.F7.F8.F9.FA.FB.FC.FD.FE.FF changed? real:1 virtual:1 result:pass
答案 1 :(得分:1)
根据MySQL documentation MySQL存储CP1251编码的字符范围(如果配置为Latin1,i)。即UTF-8转换为CP1251。 CP1252中未分配的字符将更改为问号。代码点[\x81\x8D\x8F\x90\x9D]
保存不变。
预测的最简单方法是在子例程prediction()
中实现相同的行为。它可以帮助检测以下情况:
my $predict = predict($new_string);
if ($new_string ne $predict) {
print "WARN: $new_string will not sore correctly in DB\n";
}
elsif ($existing_db_string ne $predict) {
print "INFO: $new_string will change DB string\n";
}
对大量字符进行往返测试:
#!/usr/bin/perl
use utf8;
use open ':std', ':encoding(UTF-8)'; # Terminal uses UTF-8.
use strict;
use warnings;
use 5.010;
use DBI;
use Encode qw( decode encode encode_utf8 );
sub mysql_roundtrip {
my ($val) = @_;
my $dbh = DBI->connect(
'DBI:mysql:database=testlat;host=192.168.1.3;port=3306',
'userid',
'passwd',
{
PrintError => 1,
AutoCommit => 1,
RaiseError => 1,
mysql_enable_utf8 => 1,
mysql_enable_utf8mb4 => 1,
}
) or die $DBI::errstr;
my $sql = 'UPDATE testlat SET name = ? WHERE id = 1;';
my $dbz = $dbh->do($sql, undef, encode_utf8($val));
my ($got) = $dbh->selectrow_array('SELECT name FROM testlat WHERE id=1');
return $got;
}
sub predict {
my $uni_string = shift;
my @chars = split(//,$uni_string);
my @predict;
for my $char (@chars) {
if ($char =~ /[\x81\x8D\x8F\x90\x9D]/) {
push @predict, $char;
}
else {
my $predict = decode('CP1252',encode('CP1252',$char));
if ($predict ne $char) { $predict = '?'; }
push @predict, $predict;
}
}
return join('',@predict);
}
my $fails = 0;
print "*** test via database \n";
for my $number (0x00..0x2122) {
my $uni_char = chr($number);
my $predict = predict($uni_char);
my $got = mysql_roundtrip($uni_char);
if ($predict ne $got) {
$fails++;
printf("FAIL uni:%.4X predict:%.4X got:%.4X\n",
$number,
ord($predict),
ord($got)
);
}
}
print "FAILS: $fails\n";
输出:
$ perl utf8_latin1_mysql_test2.pl
*** test via database
FAILS: 0
这会传递代码点0x00..0x2122
的测试,并且可以用于整个Unicode范围。
答案 2 :(得分:1)
以下是我们正在采用的解决方案(至少目前为止):
sub mysql_value_latin1 {
my ($val) = @_;
# See text - this looks strange - but works!
if (is_utf8($val)) {
$val = encode('utf8', $val);
} else {
$val = encode('latin1', $val);
}
return $val;
}
sub new_value_will_change_database {
my ($current_value, $new_value) = @_;
my $mysql_new_value = mysql_value_latin1($new_value);
return $current_value ne $mysql_new_value;
}
感谢@ikegami和@HelmutWollmersdorfer就此问题提供了意见。您都为$dbh
:
mysql_enable_utf8 => 1, # Decodes string received from the DB.
mysql_enable_utf8mb4 => 1, # Sets the encoding used for the connection.
正如我已经指出的那样,由于句柄由许多库共享,因此会在我们的代码库中导致不可预测的回归量。
mysql_enable_utf8 => 1
的优点很明显:Perl代码将正确编码的UTF-8数据发送到MySQL,然后MySQL将其转换为Latin1(CP 1252)并将其放入数据库中。我们保证数据存储正确,我们可以在Perl中使用UTF-8,而不是关心数据库的Latin1-ness。
还有一些缺点:任何无效的UTF-8数据都会被DBI
或DBD::mysql
拒绝(我不清楚哪个),我的测试也显示MySQL将拒绝在Latin1表中存储数据,该表不是有效的Latin1(CP 1252)。因此,在将数据发送到数据库之前,我们需要更加明确地对数据进行编码 - 实际上这可能是一件好事。
mysql_enable_utf8 => 0
似乎表现得很奇怪。看来,如果设置了Perl标量上的UTF-8标志,那么数据将采用UTF-8编码,否则数据将保留在Perl的内部编码(ISO-8859-1 / Latin1)中。然后将此数据发送到MySQL并存储在Latin1表中,数据实际上是否为有效的CP 1252数据。使用mysql_enable_utf8 => 0
我能够将所有字符存储在0x00-0xFF中而不会出现问题,即使有些字符不是有效的CP 1252字符。
如果有人发现@tests
的测试失败,请告诉我。
OP中的任务是预测一个给定的标量是否会改变数据库的值,如果交给MySQL进行UPDATE,而sub new_value_will_change_database
就是这样做 - 而不改变{{{{1}的属性。 1}}。这就是为什么我更喜欢OP的解决方案。
我同意更好的技术解决方案是采用$dbh
路线,但由于解决(潜在)回归所需的努力,这也是较差的商业决策。
mysql_enable_utf8 => 1
答案 3 :(得分:0)
我不确定你是否打印出你想要打印的内容。而且我不确定你想要实现的目标。两个字符串 相同,只是表示不同。
如果我们通过更改您的for
循环打印出更多信息:
foreach my $set (
[ 'str', $str ],
[ 'latin', $latin1 ],
[ 'utf8upgraded', $utf8upgraded ],
) {
my ($disp, $val) = @$set;
my $hex = $val;
$hex =~ s/(.)/sprintf "%X", ord($1)/ge;
my $dumper = Data::Dumper->new([substr $val, 0, 1])->Terse(1)->Dump;
chomp $dumper;
my $mysql = mysql_roundtrip($val);
my $dumper_mysql = Data::Dumper->new([substr $mysql, 0, 1])->Terse(1)->Dump;
(my $hex_mysql = $mysql) =~ s/(.)/sprintf "%X", ord($1)/ge;
chomp $dumper_mysql;
printf "%-13s: val :%s is_utf8:%d hex:%s dumper0:%s\n" .
"%-13s mysql:%s is_utf8:%d hex:%s dumper1:%s\n",
$disp, $val, is_utf8($val), $hex, $dumper,
"", $mysql, is_utf8($mysql), $hex_mysql, $dumper_mysql;
}
然后我们得到mysql concat输出是否是utf8的输出,以及那里的十六进制值等等。然后,四处游戏让它全部正常工作(参见其他有关unicode的乐趣,或者编码一般来说,就是玩),我做了以下额外的修改:
binmode STDOUT, ':utf8';
让perl正确输出utf8::decode($concat)
函数中的mysql_roundtrip
可以将文本正确解码为perl的格式。一旦我完成了这些,我得到了val和mysql以显示相同的内容,总是如同æøå。