我有一个Wordpress MySQL数据库我试图从perl的DBD :: mysql中提取一些数据。
如果我在命令行执行此操作:
mysql --raw mydb <<EOF
select post_content from wp_posts where ID = 195;
EOF
我得到了我的期望......这是前两句话:
I guess someone famous enough to have <a
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own
Wikipedia page</a> is worth anyone's consideration. I'm not familiar
with AIMAA, but they appear to have quite a few affiliated school
(particularly in the UK).
但如果我在perl中这样做:
$dsn = "DBI:mysql:database=$dbname";
$dbh = DBI->connect($dsn, $dbuser, $dbpass);
$sql_page_list = $dbh->prepare ("
SELECT post_title, post_content
FROM wp_posts
WHERE post_status = 'publish'
AND post_type = 'page'
ORDER BY post_title
");
$sql_page_list->execute();
while ( $prog_row = $sql_page_list->fetchrow_hashref ) {
print $prog_row->{post_content} . "\n";
...
我明白了:
I guess someone famous enough to have <a
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own
Wikipedia page</a>worth anyone's consideration. not familiar with
AIMAA, but they appear to have quite a few affiliated school
(particularly in the UK).
以下是标有缺失单词的相同文字:
I guess someone famous enough to have <a
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own
Wikipedia page</a> **is** worth anyone's consideration. **I'm** not
familiar with AIMAA, but they appear to have quite a few affiliated
school (particularly in the UK).
知道可能导致什么原因吗? post_content是longtext。 table_collation是utf8_general_ci。
这种模式贯穿整个文本 - 缺少单词。它适用于所有帖子。
答案 0 :(得分:1)
原来有一些嵌入的八进制240。这使输出乱码。我做了一次od -c并看到了它们。
删除它们非常简单:
$content =~ s/\xa0/ /g;