Question

我最近尝试解析字幕文件以自行修改时间。格式非常简单，有效行如下所示：

<arbitrary lines might include comments, blanks, random stuff>
<consecutively numbered ID here>
01:23:45,678 --> 01:23:47,910
<arbitrary lines might include comments, blanks, random stuff>

如何在C ++中以优雅的方式完成此操作。我只想出了非常难看的解决方案。例如，要逐行阅读文件，请搜索＆＃39; - ＆gt;＆＃39;在每一行中然后使用一系列查找（＆＃39;：＆＃39;），查找（＆＃39;，＆＃39;）和substr（）

来遍历此行

我觉得必须有一个更好的方式，例如以某种方式通过标记分割。如果我仍然可以解析像：

这样的行，那将是理想的

01 : 23    :45,678   -->  01:23:   45, 910

正确。最终结果应该是变量中的每个部分（hh，mm，ss，ms）。我不一定要求完整的实施。一般的想法和对适当的效用函数的引用就足够了。

Answer 1

您可以简单地使用std::regex来做到这一点。您定义要提取的令牌，而正则表达式将为您完成。当然，您可以修改输入字符串。它仍然可以工作。然后，您可以继续使用向量中的数据。相当简单。

请参阅一些基本代码示例：

#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <regex>

// Our test data (raw string). So, containing also \" and so on
std::string testData(R"#(01 : 23    :45,678   -->  01:23:   45, 910  ?")#");

std::regex re(R"#((\b\d+\b))#");

int main(void)
{
    // Define the variable id as vector of string and use the range constructor to read the test data and tokenize it
    std::vector<std::string> id{ std::sregex_token_iterator(testData.begin(), testData.end(), re, 1), std::sregex_token_iterator() };

    // For debug output. Print complete vector to std::cout
    std::copy(id.begin(), id.end(), std::ostream_iterator<std::string>(std::cout, " "));

    return 0;
}

如何解析文件中具有特定格式的行

1 个答案: