Question

我试图在c ++中使用fget（）逐行读取文本文件，“plus-minus”符号显示为“？”符号。它与编码有关吗？我尝试切换到Unicode，但结果更糟。请帮忙

感谢。编辑：这是我的代码：

#define AMINOACIDS "ARNDCQEGHILKMFPSTWYV"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int getAmino(char* index, int j_index, int i_index){

    int j = 0;  
    char *buffer = (char*)malloc(sizeof(char) * 100);
    FILE *file; 
    file = fopen("blosum50.txt", "r");

    if(file == NULL){   
        perror("Error at opening the file!");
    }else{

        while (!feof(file))
        {
            printf("In while:\n");
            if (fgets(buffer , 100 , file) == NULL ){       
                break;
            }

            fputs (buffer , stdout);

            if(j == j_index){
                break;
            }
            j++;
        }
        fclose (file);
     }
   return 0;
   }
int main(void){
   char *aMatrix = (char*)malloc(sizeof(char) * (21));
    strcpy(aMatrix, AMINOACIDS);
    getAmino(aMatrix, 0, 1);
    return 0;
}

然后，当我按下Ctrl + S时会弹出一条消息： enter image description here

如果我按否，符号会显示为“？”符号： enter image description here

如果我按下是，它们会显示如下： enter image description here

这是我文件的内容：

5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 -1 -0 -3 -2 0 -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3 -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3 -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4 -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3 -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4 -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4 -2 -3 -4 -4 -2 -2 -3 -4 -3 5 5 -3 3 1 -4 -3 -1 -2 -1 1 -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3 -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1 -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1 -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -1 -2 -2 2 5 -3 -2 0 -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3 -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 -1 -1 -3 -2 0 -3 -1 5

Answer 1

Visual Studio中的“另存为Unicode”将文件保存为带有“字节顺序标记”（U + FEFF）前缀的UTF-8。这就是你在第二个例子中5之前看到3个字符的原因。

我猜错了人物，你的“加减”实际上是不是±？因为它们似乎被正确读取，所以没有正确解释。您正在传递fputs原始字符串，并且它需要ASCII。不是UTF-8。

MultiByteToWideChar可以转换为UTF-16，然后您可以将其传递给WriteConsoleW。 Microsoft C ++使得Unicode输出混乱，这很奇怪，因为Microsoft Windows本身可以做到这一点。

Answer 2

正负符号不是标准ASCII的一部分（即0-127，128-255是扩展ASCII）。

正负的扩展ASCII值是十进制的241。

Unicode代码点是U + 00B1（十六进制）。

当您将文件保存为Unicode时，它看起来像是UTF-16编码。在您的代码中，您尝试以ASCII模式读取的内容。这就是输出看起来像这样的原因。

在Windows上，它应将字符241（十进制）显示为±。因此，如果它是ASCII的241，它应该看起来像±。

因此，使用某个十六进制编辑器检查文件的字符ASCII值或unicode值。这可以给你更好的画面。

无法从文件中读取“加号 - 减号”符号

2 个答案: