Perl DBD :: mysql切出longtext字段的单词

时间:2014-04-23 19:06:51

标签: mysql perl

我有一个Wordpress MySQL数据库我试图从perl的DBD :: mysql中提取一些数据。

如果我在命令行执行此操作:

mysql --raw mydb <<EOF

select post_content from wp_posts where ID = 195;
EOF

我得到了我的期望......这是前两句话:

I guess someone famous enough to have <a
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own
Wikipedia page</a> is worth anyone's consideration.  I'm not familiar
with AIMAA, but they appear to have quite a few affiliated school
(particularly in the UK).

但如果我在perl中这样做:

$dsn = "DBI:mysql:database=$dbname";
$dbh = DBI->connect($dsn, $dbuser, $dbpass);

$sql_page_list = $dbh->prepare ("
  SELECT post_title, post_content
  FROM wp_posts
  WHERE post_status = 'publish'
  AND post_type = 'page'
  ORDER BY post_title
");
$sql_page_list->execute();
while ( $prog_row = $sql_page_list->fetchrow_hashref ) {
  print $prog_row->{post_content} . "\n";
...

我明白了:

I guess someone famous enough to have <a
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own
Wikipedia page</a>worth anyone's consideration.  not familiar with
AIMAA, but they appear to have quite a few affiliated school
(particularly in the UK).

以下是标有缺失单词的相同文字:

I guess someone famous enough to have <a
href="http://en.wikipedia.org/wiki/Hee_Il_Cho" target="_blank">his own
Wikipedia page</a> **is** worth anyone's consideration.  **I'm** not
familiar with AIMAA, but they appear to have quite a few affiliated
school (particularly in the UK).

知道可能导致什么原因吗? post_content是longtext。 table_collat​​ion是utf8_general_ci。

这种模式贯穿整个文本 - 缺少单词。它适用于所有帖子。

1 个答案:

答案 0 :(得分:1)

原来有一些嵌入的八进制240。这使输出乱码。我做了一次od -c并看到了它们。

删除它们非常简单:

$content =~ s/\xa0/ /g;