Question

在Windows 7平台上，使用PostgreSQL版本9.3.9，使用PgAdmin作为客户端，在包含例如的列上选择上部的结果"ÿÿÿ"，returns null。如果存储了三个值，例如

"ada"
"john"
"mole" 
"ÿÿÿ"

除了包含"ÿÿÿ"的行外，它们都以大写形式返回;这一行没有回复，null ...

database编码方案为UTF8 / UNICODE.设置“client_encoding”具有相同的值UNICODE。

这是database中的设置问题，操作系统问题还是错误在数据库中？是否有一些推荐的解决方法？

结果：

select thecol, upper(thecol), upper(thecol) is null, convert_to(thecol, 'UTF8'), current_setting('server_encoding') from thetable where ...

是：

"Apps";"APPS";f;"Apps";"UTF8"
"All";"ALL";f;"All";"UTF8"
"Test";"TEST";f;"Test";"UTF8"
"ÿÿÿ";"";f;"\303\277\303\277\303\277";"UTF8"

pg_settings的lc_部分是：

"lc_collate";"Swedish_Sweden.1252";"Shows the collation order locale."
"lc_ctype";"Swedish_Sweden.1252";"Shows the character classification and case conversion locale."
"lc_messages";"Swedish_Sweden.1252";"Sets the language in which messages are displayed."
"lc_monetary";"Swedish_Sweden.1252";"Sets the locale for formatting monetary amounts."
"lc_numeric";"Swedish_Sweden.1252";"Sets the locale for formatting numbers."

select * from pg_database的输出是：

"template1";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";t;t;-1;12130;668;1‌;1663;"{=c/postgres,postgres=CTc/postgres}" 
"template0";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";t;f;-1;12130;668;1‌;1663;"{=c/postgres,postgres=CTc/postgres}"
"postgres";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";f;t;-1;12130;668;1;‌1663;""

9.4.4版本的实际create database语句是：

CREATE DATABASE postgres
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'Swedish_Sweden.1252'
       LC_CTYPE = 'Swedish_Sweden.1252'
       CONNECTION LIMIT = -1;

Answer 1

我的猜测是upper函数使用数据库的LC_CTYPE设置。带有DIAERESIS（U + 00FF）的拉丁文小写字母Y的大写字母是LATIN CAPITAL LETTER Y WITH DIAERESIS＆＃39; （U + 0178），它不是Windows 1252代码页的一部分。

如果首先将字符串转换为Unicode格式，upper函数可能会按预期工作：

SELECT upper(convert_to(thecol, 'UTF8')) ...

您应该为LC_CTYPE和LC_COLLATE使用不同的值。在Linux上，您使用sv_SE.UTF-8。

尽管如此，我认为这是Postgres中的一个错误。如果大写版本不能在目标字符集中表示，最好保留ÿ。

ascii 152字符（“ÿ”）上的PostgreSQL上层函数

1 个答案: