我从RSS频道抓取数据,清理它并保存在数据库中。我使用java,tidy,MySQL和JDBC。
步骤:
MySQL方案是
CREATE TABLE IF NOT EXISTS `rss_item_safe_texts` (
`id` int(10) unsigned NOT NULL,
`title` varchar(1000) NOT NULL,
`link` varchar(255) NOT NULL,
`description` mediumtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
JDBC连接URL是
connUrl = "jdbc:mysql://" + host + "/" + database + "?user=" + username + "&password=" + password + "&useUnicode=true&characterEncoding=UTF-8";
Java代码
PreparedStatement updateSafeTextSt = conn.prepareStatement("UPDATE `rss_item_safe_texts` SET `title` = ?, `link` = ?, `description` = ? WHERE `id` = ?");
updateSafeTextSt.setString(1, EscapingUtils.escapeXssInjection(title));
updateSafeTextSt.setString(2, link);
updateSafeTextSt.setString(3, EscapingUtils.escapeXssInjection(description));
updateSafeTextSt.setInt(4, itemId);
updateSafeTextSt.execute();
updateSafeTextSt.close();
因此,我在数据库中看到了破碎的字符,例如“所以它 不太可能”。我在网页上看到了输出文字(utf-8页面)。
答案 0 :(得分:5)
不要忘记还有很多其他地方可以设置不同的编码。例如,检查数据库/表/列是否具有正确的编码开头。另外,我通常会在MySQL中为utf8设置所有内容:
mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+