如何使用regex c ++将文本文件解析为变量?

时间:2014-10-24 00:15:59

标签: c++ arrays regex parsing text

请帮助我实现将这个序列转化为有意义的输出的梦想。 :)

请参阅正则表达式,它有效!:http://regex101.com/r/iM4yN2/1 现在我需要知道如何使用它。如果我可以把它放到一个多维数组中,例如configFile [0] [0] = [Tuner,]可行。或者,如果我可以将其转换为以逗号分隔的列表,那么我可以再次解析它并将其放入数组中,最后将其转换为单个变量。无论如何,你不需要说明如何实际分配变量,如果我真的需要帮助,我将创建另一个问题。主要是我需要帮助使用正则表达式函数并将数据输出到SOME变量中,我可以访问每行=符号两侧的各种文本。

正则表达式:

^[\t ]*(.*?)\s*=[\t ]*(.*?)(#.*)?$

测试字符串:

    ### MODULES ###
Tuner         =  
 PitchDetector = 0
PhaseLocker   = 0
FileOutput    = 1

### FILE MANAGER ###
RenameFile_AvgFreq  =  dfgsdfg dsf gdfs g #gdrgk
RenameFile_NoteName = 0
    RenameFile_Prefix   = "The String Is Good"
RenameFile_Suffix   = ""
OutputFolder        = "..\Folder\String\"

### PITCH DETECTOR ###
AnalysisChannel = 1  #int starting from 1
BlockSize             = 8  #power of 2
Overlap               = 16 #power of 2
NormalizeForDetection = 0

### TUNER ###
Smoothing = 0.68
Envelope  = 0.45

### PHASELOCKER ###
FFTSize    = 1024 #powert of 2
FFTOverlap = 54687
WindowType = 0
MaxFreq    = 5000

我的变量:

//Modules
bool Tuner;
bool PitchDetector;
bool PhaseLocker;
bool FileOutput;

//File Manager
bool RenameFile_AvgFreq;
bool RenameFile_NoteName;
std::string RenameFile_Prefix;
std::string RenameFile_Suffix;
std::string OutputFolder;

//Pitch Detector
int AnalysisChannel;
int BlockSize;
int Overlap;
bool NormalizeForDetection;

//Tuner
float Smoothing;
float Envelope;

//Phaselocker
int FFTSize;
int FFTOverlap;
int FFTWindowType;
float FFTMaxFreq;

最后的注释:我花了很长时间看c ++正则表达式函数......非常混乱的东西。我知道如何在python中这样做而不必考虑两次。

2 个答案:

答案 0 :(得分:2)

包括以下内容:

#include <string>
#include <regex>

声明字符串和正则表达式类型:

std::string s;
std::regex e;

在你的main函数中,分配字符串和正则表达式变量并调用regex函数(你可以在声明变量时分配变量):

int main()
{
    s="i will only 349 output 853 the numbers 666"
    e="(\\d+)"
    s = std::regex_replace(s, e, "$1\n", std::regex_constants::format_no_copy);

    return 0;
}

注意我如何将结果正确地放回字符串中。当然,您可以使用不同的字符串来存储结果。 &#34; std :: regex_constants :: format_no_copy&#34;是一个标志,告诉正则表达式函数只输出&#34;子串&#34;又名小组赛。还要注意我如何在&#34; \ d +&#34;上使用双斜杠。如果正则表达式模式不起作用,请尝试使用双斜杠。

使用正则表达式查找键/值对,例如&#34; BlockSize = 1024&#34;,你可以创建一个模式,如:

BlockSize\s*=\s*((?:[\d.]+)|(?:".*"))

在c ++中你可以用:

创建那个正则表达式模式
expr = key+"\\s*=\\s*((?:[\\d.]+)|(?:\".*\"))";

并返回匹配:

config = std::regex_replace(config, expr, "$1", std::regex_constants::format_no_copy);

并将它们放在一个能够返回默认值的函数中:

std::string Config_GetValue(std::string key, std::string config, std::string defval)
{
    std::regex expr;
    match = key+"\\s*=\\s*((?:[\\d.]+)|(?:\".*\"))";
    config = std::regex_replace(config, expr, "$1", std::regex_constants::format_no_copy);
    return config == "" ? defval : config;
}

FULL CODE(使用std :: stoi和std :: stof在需要时将字符串转换为数字,并使用auto类型,因为右侧(RHS)清楚表明类型是什么):

#include "stdafx.h"
#include <string>
#include <regex>
#include <iostream>

std::string Config_GetValue(std::string key, std::string config, std::string defval)
{
    std::regex expr;
    match = key+"\\s*=\\s*((?:[\\d.]+)|(?:\".*\"))";
    config = std::regex_replace(config, expr, "$1", std::regex_constants::format_no_copy);
    return config == "" ? defval : config;
}


int main()
{
    //test string
    std::string s = "    ### MODULES ###\nTuner         =  \n PitchDetector = 1\n PhaseLocker = 0 \nFileOutput    = 1\n\n### FILE MANAGER ###\nRenameFile_AvgFreq  =  dfgsdfg dsf gdfs g #gdrgk\nRenameFile_NoteName = 0\n    RenameFile_Prefix   = \"The String Is Good\"\nRenameFile_Suffix   = \"\"\nOutputFolder        = \"..\\Folder\\String\\\"\n\n### PITCH DETECTOR ###\nAnalysisChannel = 1  #int starting from 1\nBlockSize             = 1024  #power of 2\nOverlap               = 16 #power of 2\nNormalizeForDetection = 0\n\n### TUNER ###\nSmoothing = 0.68\nEnvelope  = 0.45\n\n### PHASELOCKER ###\nFFTSize    = 1024 #powert of 2\nFFTOverlap = 54687\nWindowType = 0\nMaxFreq    = 5000";

    //Modules   
    auto FileOutput    = stoi(Config_GetValue("FileOutput", s, "0"));
    auto PitchDetector = stoi(Config_GetValue("PitchDetector", s, "0"));
    auto Tuner         = stoi(Config_GetValue("Tuner", s, "0"));
    auto PhaseLocker   = stoi(Config_GetValue("PhaseLocker", s, "0"));

    //File Manager
    auto RenameFile_AvgFreq  = stoi(Config_GetValue("RenameFile_AvgFreq", s, "0"));
    auto RenameFile_NoteName = stoi(Config_GetValue("RenameFile_NoteName", s, "0"));
    auto RenameFile_Prefix   = Config_GetValue("RenameFile_Prefix", s, "");
    auto RenameFile_Suffix   = Config_GetValue("RenameFile_Suffix", s, "");
    auto OutputFolder        = Config_GetValue("FileOutput", s, "");

    //Pitch Detector
    auto AnalysisChannel       = stoi(Config_GetValue("AnalysisChannel", s, "1"));
    auto BlockSize             = stoi(Config_GetValue("BlockSize", s, "4096"));
    auto Overlap               = stoi(Config_GetValue("Overlap", s, "8"));
    auto NormalizeForDetection = stoi(Config_GetValue("NormalizeForDetection", s, "0"));

    //Tuner 
    auto Smoothing     = stof(Config_GetValue("Smoothing", s, ".5"));
    auto Envelope      = stof(Config_GetValue("Envelope", s, ".3"));
    auto TransientTime = stof(Config_GetValue("TransientTime", s, "0"));

    //Phaselocker   
    auto FFTSize       = stoi(Config_GetValue("FFTSize", s, "1"));
    auto FFTOverlap    = stoi(Config_GetValue("FFTOverlap", s, "1"));
    auto FFTWindowType = stoi(Config_GetValue("FFTWindowType", s, "1"));
    auto FFTMaxFreq    = stof(Config_GetValue("FFTMaxFreq", s, "0.0"));

    std::cout << "complete";
    return 0;
}

答案 1 :(得分:1)

另一种方法是使用regex_iterator:

#include <regex>
using std::regex;
using std::sregex_iterator;

void CreateConfig(string config)
{
    //group 1,2,3,4,5 = key,float,int,string,bool
    regex expr("^[\\t ]*(\\w+)[\\t ]*=[\\t ]*(?:(\\d+\\.+\\d+|\\.\\d+|\\d+\\.)|(\\d+)|(\"[^\\r\\n:]*\")|(TRUE|FALSE))[^\\r\\n]*$", std::regex_constants::icase);
    for (sregex_iterator it(config.begin(), config.end(), expr), itEnd; it != itEnd; ++it)
    {
        if ((*it)[2] != "") cout << "FLOAT -> " << (*it)[1] << " = " <<(*it)[2] << endl;
        else if ((*it)[3] != "") cout << "INT -> " << (*it)[1] << " = " <<(*it)[3] << endl;
        else if ((*it)[4] != "") cout << "STRING -> " << (*it)[1] << " = " <<(*it)[4] << endl;
        else if ((*it)[5] != "") cout << "BOOL -> " << (*it)[1] << " = " << (*it)[5] << endl;
    }
}

int main()
{   
    string s = "what = 1\n: MODULES\nFileOutput = \"on\" :bool\nPitchDetector = TRuE :bool\nTuner = on:bool\nHarmSplitter = off:bool\nPhaseLocker = on\n\nyes\n junk output = \"yes\"\n\n: FILE MANAGER\nRenameFile  AvgFreq  = 1 \nRenameFile_NoteName = 0 :bool\nRenameFile_Prefix   = \"The Strin:g Is Good\" :string\nRenameFile_Suffix   = \"\":string\nOutputFolder        = \"..\\Folder\\String\\\" :relative path\n\n: PITCH DETECTOR\nAnalysisChannel       = 1  :integer starting from 1\nBlockSize             = 8  :power of 2\nOverlap               = 16 :power of 2\nNormalizeForDetection = 0  :bool\n\n: TUNER\nSmoothing = 0.68 :float\nEnvelope  = 0.45 :float\n\n: PHASE LOCKER\nFFTSize    = 1024  :power of 2\nFFTOverlap = 54687 :power of 2\nWindowType = 0     :always set to 0\nMaxFreq    = 5000  :float";
    CreateConfig(s);

    return 0;
}

让我们打破这个。我创建的正则表达式使用了一个^ regexy的东西,这里是$ format,这样每行文本都被单独考虑:^ =行的开头,$ =行尾。正则表达式查找:variable_name = decimal OR number OR string OR(true或false)。因为每种类型都存储在自己的组中,所以我们知道每场比赛的类型。

为了解释for循环,我将以几种不同的方式编写代码

//You can declare more than one variable of the same type:
for (sregex_iterator var1(str.begin(), str.end(), regexExpr), var2); var1 != var2; var1++)

//Or you can delcare it outside the for loop:
sregex_iterator var1(str.begin(), str.end(), regexExpr);
sregex_iterator var2;
for (; var1 != var2; var1++)

//Or the more classic way:
sregex_iterator var1(str.begin(), str.end(), regexExpr);
for (sregex_iterator var2; var1 != var2; var1++)

现在为for循环的主体。它说“如果group2不是空白,则打印组2是浮点数。如果gorup3不是空白,则打印group3是int。如果group4不是空白,则打印组4是字符串。如果group5不是空白, print group5是一个bool。当在循环中时,语法为:

//group0 is some kind of "currently evaluating" string plus group matches.
//group1 is my key group
//group2/3/4/5 are my values groups float/int/string/bool.
theString = (*iteratorVariableName)[groupNumber]