Question

我正在研究一个由SWI Prolog文本分析器组成的Prolog大学，它非常简单地完成以下工作：

读取包含一些文字的.txt输入文件，并将此文本放入我称之为ASCII字符的列表中： dataggare.txt
对此原始ASCII字符列表执行某些操作，并将其保存在名为 System.txt
最后将新修改的 System.txt 文件与另一个名为 oracolo.txt 的文件进行比较（表示 System.txt的方式） 应该是所有操作都成功完成）， FMeasure值表示 System.txt 与 oracolo.txt 相似的程度但是现在这不重要

当我将新的 System.txt 文件与 oracolo.txt 文件进行比较时出现问题，并且仅当我使用Linux运行程序时出现此问题（如果我在Windows上运行它我没有问题）

所以问题是，当我执行以下查询时，我有一系列与编码 oracolo.txt文件相关的警告

[debug]  ?- tagConfronto('dataggare.txt', 'oracolo.txt', FMeasure).
Warning: oracolo.txt:1:422: Illegal UTF-8 continuation
Warning: oracolo.txt:2:77: Illegal UTF-8 continuation
Warning: oracolo.txt:2:129: Illegal UTF-8 continuation
Warning: oracolo.txt:3:31: Illegal UTF-8 continuation
Warning: oracolo.txt:3:71: Illegal UTF-8 continuation
Warning: oracolo.txt:3:199: Illegal UTF-8 start
Warning: oracolo.txt:3:258: Illegal UTF-8 continuation
............
Warning: oracolo.txt:12:222: Illegal UTF-8 continuation
Warning: oracolo.txt:12:563: Illegal UTF-8 continuation
FMeasure = 0.02564102564102564

tagConfronto/3谓词将 dataggare.txt 文件内容与 oracolo.txt 文件进行比较，并计算相关的FMeasure值

正如您所看到的，执行此操作会发现 oracolo.txt 编码存在一些问题，因为它会大大改变FMeasure的值

我只有在Linux上运行程序而不是在Windows下运行时才出现此问题（在第二种情况下，我没有警告和正确的FMeasure值）

有些同事告诉我，也许我可以用某种方式解决重新保存文件更改编码（我不知道是否需要保存以不同的方式 System.txt 或 oracolo.txt 我不知道我必须使用哪种编码或者是否有不同的溶液）

有什么想法吗？

Answer 1

在Unix上，

?- current_prolog_flag(encoding,X).
X = utf8.

在Windows上

?- current_prolog_flag(encoding,X).
X = text.

也许你应该在打开文件时设置相同的值，使用open / 4 - 或使用set_prolog_flag / 2全局更改。要更改已打开的流，请使用set_stream / 2。

我不确定encoding(text)是否合适，请参阅documentation page了解所有支持的值。

Linux系统上Prolog程序中TEXT文件编码的问题

1 个答案: