我有一个包含嵌套对象的文本文件,我需要保留它们之间的关系。我将如何阅读它们? 我想我需要使用像树这样的数据结构,它的节点可以有任意数量的子级(有点像n元树,没有'n'限制)。解析数据并在内存中构建树使我震惊。
文本文件中的数据结构如下:
{
Element_A (3)
Element_B (3,4)
{
Element_B (6,24)
Element_A (1)
}
{
Element_A (3)
{
Element_A (4)
Element_B (12,6)
}
Element_B (1,4)
}
}
编辑:为了明确起见,打开/关闭花括号将单个对象及其所有子对象括起来。上面的Element_A和Element_B是同一对象的一部分。
到目前为止,我将整个文件解析为字符串向量,如下所示:
vector<string> lines;
ifstream file("input.txt");
string s;
while (getline(file, s))
lines.push_back(s);
并使用类似以下内容的方法从每一行读取数据
std::regex re(R"(Element_A \(\s*(\d+)\))");
std::smatch m;
if (std::regex_search(line, m, re) )
{
// extract data from 'm'
}
编辑2:Scheff的解决方案适合我的程序。
// Node is defined somewhere at the top of the file
struct Node
{
int a = 0;
int b[2] = {0};
std::vector<Node> children;
};
// this code is inside some function that does the parsing
Node root;
stack<Node*> nodeStack;
nodeStack.push(&root);
for(string line; getline(fin, line);)
{
line = trim(line); // custom function to remove leading/trailing spaces/tabs (not included in this post for brevity)
if (line.size() == 0) // empty line (data file might have empty lines for readability)
continue;
else if (line.size() == 1) // only one character
{
if (line[0] == '{')
{
nodeStack.top()->children.push_back(Node());
nodeStack.push(&nodeStack.top()->children.back());
}
else if (line[0] == '}')
{
nodeStack.pop();
}
else
cerr << "Error: Invalid character detected.\n";
}
else // at least two characters
{
regex reEl_A(R"(Element_A \(\s*(\d+)\))");
regex reEl_B(R"(Element_B \(\s*(\d+),\s*(\d+)\))");
smatch m;
if (std::regex_search(line, m, reEl_A))
{
nodeStack.top()->a = std::stoi(m[1]);
continue;
}
if (std::regex_search(line, m, reEl_B))
{
nodeStack.top()->b[0] = std::stoi(m[1]);
nodeStack.top()->b[1] = std::stoi(m[2]);
continue;
}
}
}
if (nodeStack.empty() || nodeStack.top() != &root)
{
std::cerr << "ERROR! Data not well balanced.\n";
}
答案 0 :(得分:1)
这是它的工作方式:
"{"
将新节点推送到当前节点并将其设置为当前节点"}"
弹出当前节点并将其父节点设置为当前节点"Element_A"
解析a的值"Element_B"
解析b的值节点可以存储其父节点。
另外,文件阅读器可以在内部使用std::stack
来记住父母(我在下面的示例代码中所做的事情)。
一个示例程序来描绘这个:
#include <cstring>
#include <iomanip>
#include <iostream>
#include <stack>
#include <string>
#include <vector>
struct Node {
std::pair<int, int> a;
int b;
std::vector<Node> children;
Node(): a(0, 0), b(0) { }
};
std::ostream& operator<<(std::ostream &out, const Node &node)
{
static unsigned indent = 0;
out << std::setw(indent) << ""
<< "Node:"
<< " a(" << node.a.first << ", " << node.a.second << "),"
<< " b(" << node.b << ") {\n";
indent += 2;
for (const Node &child : node.children) out << child;
indent -= 2;
out << std::setw(indent) << ""
<< "}\n";
return out;
}
void read(std::istream &in, Node &node)
{
std::stack<Node*> nodeStack;
nodeStack.push(&node);
// nodeStack.top() is the (pointer to) current node
for (std::string line; std::getline(in, line);) {
if (line.compare(0, strlen("{"), "{") == 0) {
nodeStack.top()->children.push_back(Node());
nodeStack.push(&nodeStack.top()->children.back());
} else if (line.compare(0, strlen("}"), "}") == 0) {
nodeStack.pop();
} else if (line.compare(0, strlen("Element_A"), "Element_A") == 0) {
std::istringstream parser(line.substr(strlen("Element_A")));
parser >> nodeStack.top()->a.first >> nodeStack.top()->a.second;
} else if (line.compare(0, strlen("Element_B"), "Element_B") == 0) {
std::istringstream parser(line.substr(strlen("Element_B")));
parser >> nodeStack.top()->b;
} // else ERROR!
}
if (nodeStack.empty() || nodeStack.top() != &node) {
std::cerr << "ERROR! Data not well balanced.\n";
}
}
const char *const sample =
"{\n"
"Element_A 3\n"
"Element_B 3 4\n"
"{\n"
"Element_B 6 24\n"
"Element_A 1\n"
"}\n"
"{\n"
"Element_A 3\n"
"{\n"
"Element_A 4\n"
"Element_B 12 6\n"
"}\n"
"Element_B 1 4\n"
"}\n"
"}\n";
int main()
{
std::istringstream in(sample);
Node root;
read(in, root);
std::cout << root;
return 0;
}
输出:
Node: a(0, 0), b(0) {
Node: a(3, 0), b(3) {
Node: a(1, 0), b(6) {
}
Node: a(3, 0), b(1) {
Node: a(4, 0), b(12) {
}
}
}
}
注意:
解析是通过非常简单的丑陋方式完成的。我想草图节点管理就足够了。
可以找到解析器的另一种方法,例如在Live Demo on coliru中,或者可能使用OP的std::regex
方法。