Question

我正在尝试使用宽度说明符和printf函数格式化c中某些字符串的输出。但是我无法获得我想要的行为。似乎每次printf遇到字符å，ä或ö时，为字符串保留的宽度会缩小一个位置。

代码摘要说明：

#include <stdio.h>

int main(void)
{
  printf(">%-10s<\n", "aoa");
  printf(">%-10s<\n", "aäoa");
  printf(">%-10s<\n", "aäoöa");
  printf(">%-10s<\n", "aäoöaå");

  return 0;
}

在我的ubuntu linux bash-shell中输出。

>aoa       <
>aäoa     <
>aäoöa   <
>aäoöaå <

我正在寻找有关如何处理此问题的建议。我想要的是上面片段中的所有字符串都在空间填充的10个字符宽的字段中打印，如下所示：

>aoa       <
>aäoa      <
>aäoöa     <
>aäoöaå    <

如果这不是其他设置的问题，我也非常感谢有关这种情况发生的原因或反馈。

Answer 1

使用宽字符串和wprintf：

#include <cwchar>
#include <locale.h>

int main(void)
{
  // seems to be needed for the correct output encoding
  setlocale(LC_ALL, "");

  wprintf(L">%-10ls<\n", L"aoa");
  wprintf(L">%-10ls<\n", L"aäoa");
  wprintf(L">%-10ls<\n", L"aäoöa");
  wprintf(L">%-10ls<\n", L"aäoöaå");

  return 0;
}

Answer 2

为什么会这样？

看看The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

作为宽字符的替代方法，在UTF8下，您可以使用此函数计算非ASCII字符的数量，然后，您可以将结果添加到printf的宽度说明符中：

#include <stdio.h>

int func(const char *str)
{
    int len = 0;

    while (*str != '\0') {
        if ((*str & 0xc0) == 0x80) {
            len++;
        }
        str++;
    }
    return len;
}

int main(void)
{
    printf(">%-*s<\n", 10 + func("aoa"), "aoa");
    printf(">%-*s<\n", 10 + func("aäoa"), "aäoa");
    printf(">%-*s<\n", 10 + func("aäoöa"), "aäoöa");
    printf(">%-*s<\n", 10 + func("aäoöaå"), "aäoöaå");
    return 0;
}

输出：

>aoa       <
>aäoa      <
>aäoöa     <
>aäoöaå    <

Answer 3

Alter Mann's accepted answer沿着正确的行，除了不应该只是硬编码自定义函数来计算多字节字符串中不编码为可见字符的字节数：您应该使用{本地化代码{1}}或类似，setlocale(LC_ALL, "")计算字符串中不编码可见字符的字节数。

setlocale()是标准C（C89，C99，C11），但也在POSIX.1中定义。 mbstowcs()是标准C99和C11，也在POSIX.1中定义。两者都在Microsoft C库中实现，因此它们基本上无处不在。

考虑以下示例程序，它打印在命令行上指定的C字符串：

strlen(str) - mbstowcs(NULL, str, 0)

如果将上述内容编译为#include <stdlib.h> #include <string.h> #include <locale.h> #include <stdio.h> /* Counts the number of (visible) characters in a string */ static size_t ms_len(const char *const ms) { if (ms) return mbstowcs(NULL, ms, 0); else return 0; } /* Number of bytes that do not generate a visible character in a string */ static size_t ms_extras(const char *const ms) { if (ms) return strlen(ms) - mbstowcs(NULL, ms, 0); else return 0; } int main(int argc, char *argv[]) { int arg; /* Default locale */ setlocale(LC_ALL, ""); for (arg = 1; arg < argc; arg++) printf(">%-*s< (%zu bytes; %zu chars; %zu bytes extra in wide chars)\n", (int)(10 + ms_extras(argv[arg])), argv[arg], strlen(argv[arg]), ms_len(argv[arg]), ms_extras(argv[arg])); return EXIT_SUCCESS; }，则运行

example

程序将输出

./example aaa aaä aää äää aa€ a€€ €€€ a ä €

如果上一个>aaa < (3 bytes; 3 chars; 0 bytes extra in wide chars) >aaä < (4 bytes; 3 chars; 1 bytes extra in wide chars) >aää < (5 bytes; 3 chars; 2 bytes extra in wide chars) >äää < (6 bytes; 3 chars; 3 bytes extra in wide chars) >aa€ < (5 bytes; 3 chars; 2 bytes extra in wide chars) >a€€ < (7 bytes; 3 chars; 4 bytes extra in wide chars) >€€€ < (9 bytes; 3 chars; 6 bytes extra in wide chars) >a < (1 bytes; 1 chars; 0 bytes extra in wide chars) >ä < (2 bytes; 1 chars; 1 bytes extra in wide chars) >€ < (3 bytes; 1 chars; 2 bytes extra in wide chars) > < (4 bytes; 1 chars; 3 bytes extra in wide chars)与其他<不对齐，那是因为使用的字体没有准确固定宽度：表情符号比普通字符Ä宽，这就是全部。责备字体。

如果您的操作系统/浏览器/字体无法显示，则最后一个字符是来自Emoticons unicode block的U + 1F608 SMILING FACE WITH HORNS。在Linux中，以上所有>和<在我拥有的所有终端中都正确排列，包括在控制台（非图形系统控制台）中，尽管控制台字体没有表情符号的字形，而只是将其显示为钻石。

与Alter Mann's answer不同，这种方法是可移植的，并且不会假设当前用户实际使用的字符集。

printf中的宽度说明符与重音字符无法正常工作

3 个答案: