Question

我从.dat文件中获得了大量数据点，看起来像这样

 + (  0.00000000E+00   0.00000000E+00     //this '(' happens once per block of data
 +    0.99999997E-04   0.00000000E+00
 +    0.19999999E-03   0.00000000E+00
 +    ...

我无法控制使用这个数据的程序对我来说更友好。

到目前为止，我在向量中得到每一行，我想解析它们，所以我只有数字可以使用，但我仍然希望保持.dat文件的完整性，因为另一个程序使用了。 dat文件不变。

我正在考虑用空格分隔每个字符串，但空格是不同的大小（除非没关系）并将它们放在向量中并只获取我需要的数据，但数据的第一行有4个字符串，其余行有3个字符串。

非常感谢任何帮助

编辑：我正在使用原始的.dat文件，跟踪它，并且任何不符合我的阈值的数据块都会被传递掉。任何这样做，都会被写入新文件。当然，使用这个新文件的所有内容必须与原始文件完全相同，减去我不需要的数据。

[JD]每条评论编辑：

我如何解析这些线条，保持关于它的所有内容相同而不删除任何关于线条的信息，并获取数字以便我可以处理我需要保留的内容和我不需要的内容？

Answer 1

我会创建一个ctype方面，根据评论将+和( [编辑：和)分类为空格，然后只读取数字。让我们假设你保持一个数字的标准是它比1.0e-4更大。要将数据复制到新文件，删除较小的数字，您可以执行以下操作：

#include <locale>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <sstream>
#include <numeric>

class my_ctype : public
std::ctype<char>
{
    mask my_table[table_size];
public:
    my_ctype(size_t refs = 0)  
        : std::ctype<char>(&my_table[0], false, refs)
    {
        std::copy_n(classic_table(), table_size, my_table);
        my_table['('] = (mask)space;
        my_table['+'] = (mask)space;
        my_table[')'] = (mask)space;
    }
};

int main() {
    std::locale x(std::locale::classic(), new my_ctype);
    std::cin.imbue(x);

    std::remove_copy_if(std::istream_iterator<double>(std::cin), 
        std::istream_iterator<double>(), 
        std::ostream_iterator<double>(std::cout, "\n"), 
        [](double in){return in < 1.0e-4; }); // criterion for removing a number
    return 0;
}

我猜（但实际上并不知道）你删除数字的标准可能比简单的比较复杂一点。如果它变得复杂得多，您可能希望使用手动定义的仿函数而不是lambda来定义您的标准。其余的代码（特别是读取数据的部分）可能会保持不变。

另请注意，我只是在每行输出一个数字。我不知道你是否需要保持更接近原始格式的东西，所以目前我只是保持简单。

Answer 2

您应该使用字符串标记生成器来获取每个数据。根据您已经使用的图书馆，它可能非常简单。

否则，您可以使用strtok轻松搞定。

如果您正在使用MS CString，您可以自己编写代码，如：

CStringArray TokenizeString(const CString& str, const CString &sep)
{
    CStringArray elements;

    CString item = "";
    CString strCpy = str;
    long sepPos = strCpy.Find(sep);

    while (sepPos != -1)
    {
        // extract item
        item = strCpy.Left(sepPos);
            // add it to the list
        elements.Add(item);
        // prepare next loop
        strCpy = strCpy.Right(strCpy.GetLength() - sepPos - sep.GetLength()); // get the right part of the string (after the found separator)
        sepPos = strCpy.Find(sep);
    }

    // add last item if needed (remaining part of the string)
    if (!strCpy.IsEmpty()) elements.Add(strCpy);
}

希望这有帮助！

Answer 3

您可以使用文件流operator>>一次获取每个项目，这将跳过空格。当你到达'（'或空白（例如，空格）的列时，检查它并根据你得到的内容进行切换。如果你得到'（'，再次operator>>来获取实际数据如果你没有得到'（'，那么你得到数据，因为operator>>会跳过空格。

这是一个充满希望的完整例子：

#include <string>
#include <iostream>
#include <vector>
#include <fstream>
#include <algorithm>
using namespace std;

struct Inbound
{
    std::string  a_, b_;
};

int main()
{
    ifstream f("c:\\dev\\hacks\\data.txt");

    while( !f.bad() && !f.eof() )
    {
        string s;
        f >> s; // should be '+' -- discard
        f >> s; // either '(' or first datum
        if( s == "(" )
            f >> s; // get the first datum
        Inbound in;
        in.a_ = s;
        f >> in.b_;

        cout << "Got: " << in.a_ << "\t" << in.b_ << endl;
    }

}

输出：

Got: 0.00000000E+00     0.00000000E+00
Got: 0.99999997E-04     0.00000000E+00
Got: 0.19999999E-03     0.00000000E+00

解析一个奇怪的字符串C ++

3 个答案: