Question

在我正在处理的项目中，我处理文件并在继续之前检查它们是否存在。似乎不可能重命名甚至使用文件路径中带有“短划线”的文件。

std::string _old = "D:\\Folder\\This – by ABC.txt";
std::rename(_old.c_str(), "New.txt");

此处_old变量被ABC.txt解释为D：\ Folder \Thisû 我试过了

setlocale(LC_ALL, "");
//and
setlocale(LC_ALL, "C");
//or    
setlocale(LC_ALL, "en_US.UTF-8");

但他们都没有工作..应该做什么？

Answer 1

Windows ANSI Western编码具有Unicode n-dash，U + 2013，“ - ”，代码点150（十进制）。当您将其输出到具有活动代码页437，原始IBM PC character set或兼容的控制台时，则将其解释为“û”。所以你的字符串文字中有正确的代码页1252字符，因为

您正在使用Visual C ++，它默认使用Windows ANSI代码页来编码窄字符串文字，或者
您使用的是旧版本的g ++，它不执行标准规定的转换和检查，只是直接通过其机制传递窄字符字节，并且您的源代码编码为Windows ANSI Western（或兼容），或
我没想到的东西。

前两种可能性中的任何一种

rename来电将 。

我测试过它确实适用于Visual C ++。我没有旧版本的g ++，但我测试它适用于5.1版本。也就是说，我测试过该文件真的被重命名为New.txt。

// Source encoding: UTF-8
// Execution character set: Windows ANSI Western a.k.a. codepage 1252.
#include <stdio.h>      // rename
#include <stdlib.h>     // EXIT_SUCCESS, EXIT_FAILURE
#include <string>       // std::string
using namespace std;

auto main()
    -> int
{
    string const a = ".\\This – by ABC.txt";    // Literal encoded as CP 1252.
    return rename( a.c_str(), "New.txt" ) == 0? EXIT_SUCCESS : EXIT_FAILURE;
}

示例：

[C:\my\forums\so\265]
> dir /b *.txt
File Not Found

[C:\my\forums\so\265]
> g++ r.cpp -fexec-charset=cp1252

[C:\my\forums\so\265]
> type nul >"This – by ABC.txt"

[C:\my\forums\so\265]
> run a
Exit code 0

[C:\my\forums\so\265]
> dir /b *.txt
New.txt

[C:\my\forums\so\265]
> _

...其中run只是一个报告退出代码的批处理文件。

如果您的Windows ANSI代码页不是代码页1252，那么您需要使用特定的Windows ANSI代码页。

您可以通过GetACP API函数检查Windows ANSI代码页，例如通过这个命令：

[C:\my\forums\so\265]
> wmic os get codeset /value | find "="
CodeSet=1252

[C:\my\forums\so\265]
> _

如果该代码页支持n-dash字符，代码将起作用。

此编码模型基于为每个相关主要语言环境（包括字符编码）提供一个可执行文件版本。

另一种方法是用Unicode完成所有事情。这可以通过Boost文件系统移植完成，它将被采用到C ++ 17的标准库中。或者，您可以使用Windows API或事实上的标准库扩展到Windows中的标准库，即_rename。

使用Visual C ++ 2015实验文件系统模块的示例：

// Source encoding: UTF-8
// Execution character set: irrelevant (everything's done in Unicode).
#include <stdlib.h>     // EXIT_SUCCESS, EXIT_FAILURE

#include <filesystem>   // In C++17 and later, or Visual C++ 2015 and later.
using namespace std::tr2::sys;

auto main()
    -> int
{
    path const old_path = L".\\This – by ABC.txt";    // Literal encoded as wide string.
    path const new_path = L"New.txt";
    try
    {
        rename( old_path, new_path );
        return EXIT_SUCCESS;
    }
    catch( ... )
    {}
    return EXIT_FAILURE;
}

要为可移植代码正确执行此操作，您可以使用Boost，或者您可以创建一个使用任何可用实现的包装器标头。

Answer 2

这取决于操作系统。在Linux中，文件名是简单的字节数组：忘记编码，只需重命名文件。

但似乎您使用的是Windows，文件名实际上是一个包含16位字符的以空字符结尾的字符串。在这种情况下，最好的方法是使用wstring而不是弄乱编码。

不要尝试编写与平台无关的代码来解决特定于平台的问题。 Windows使用Unicode作为文件名，因此您必须编写特定于平台的代码，而不是使用标准函数rename。

只需撰写L"D:\\Folder\\This \u2013 by ABC.txt"并致电_wrename。

Answer 3

它真的依赖于平台，Unicode很头疼。取决于您使用的编译器。对于MS（VS2010或更早版本）的旧版本，您需要使用MSDN中描述的API。此测试示例使用您遇到问题的名称创建文件，然后重命名

// #define _UNICODE // might be defined in project
#include <string>

#include <tchar.h>
#include <windows.h>

using namespace std;

// Convert a wide Unicode string to an UTF8 string
std::string utf8_encode(const std::wstring &wstr)
{
    if( wstr.empty() ) return std::string();
    int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
    std::string strTo( size_needed, 0 );
    WideCharToMultiByte                  (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
    return strTo;
}

// Convert an UTF8 string to a wide Unicode String
std::wstring utf8_decode(const std::string &str)
{
    if( str.empty() ) return std::wstring();
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo( size_needed, 0 );
    MultiByteToWideChar                  (CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

int _tmain(int argc, _TCHAR* argv[] ) {
    std::string pFileName = "C:\\This \xe2\x80\x93 by ABC.txt";
    std::wstring pwsFileName = utf8_decode(pFileName);

    // can use CreateFile id instead
    HANDLE hf = CreateFileW( pwsFileName.c_str() ,
                      GENERIC_READ | GENERIC_WRITE,
                      0,
                      0,
                      CREATE_NEW,
                      FILE_ATTRIBUTE_NORMAL,
                      0);
    CloseHandle(hf);
    MoveFileW(utf8_decode("C:\\This \xe2\x80\x93 by ABC.txt").c_str(), utf8_decode("C:\\This \xe2\x80\x93 by ABC 2.txt").c_str());
}

这些助手仍然存在问题，因此您可以使用空终止字符串。

std::string utf8_encode(const std::wstring &wstr)
{
    std::string strTo;
    char *szTo = new char[wstr.length() + 1];
    szTo[wstr.size()] = '\0';
    WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), -1, szTo, (int)wstr.length(), NULL, NULL);
    strTo = szTo;
    delete[] szTo;
    return strTo;
}


// Convert an UTF8 string to a wide Unicode String
std::wstring utf8_decode(const std::string &str)
{
    std::wstring wstrTo;
    wchar_t *wszTo = new wchar_t[str.length() + 1];
    wszTo[str.size()] = L'\0';
    MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, wszTo, (int)str.length());
    wstrTo = wszTo;
    delete[] wszTo;
    return wstrTo;
}

转换字符大小的问题。使用0调用WideCharToMultiByte作为目标缓冲区的大小允许获取转换所需的字符大小。然后它将返回目标缓冲区大小所需的字节数。所有这些杂乱的代码解释了为什么像Qt这样的框架有如此复杂的代码来支持基于Unicode的文件系统。实际上，摆脱所有可能的错误的最佳成本效益方法是使用这样的框架。

for VS2015

std::string _old = u8"D:\\Folder\\This \xe2\x80\x93 by ABC.txt"s;

根据他们的文档。我无法检查那个。

对于mingw。

std::string _old = u8"D:\\Folder\\This \xe2\x80\x93 by ABC.txt";
std::cout << _old.data();

输出包含正确的文件名...但对于文件API，您仍需要进行适当的转换

使用C ++中的名称中的短划线重命名文件

3 个答案: