使用c中的正则表达式在文本文件中跳过带有破折号的行

时间:2016-05-03 08:57:11

标签: c# parsing text text-files

我有一个带有SQL命令的文本文件,我已经完成了一些代码"忽略" orde中的注释和空格只能获得命令(我将在下面发布代码以及文本文件和输出的示例),这很好但在该文本文件中我也有这样的行" - ---------------------------------"我需要忽略,我已经完成了忽略它的代码,但我无法弄清楚它为什么不能正常工作。 代码:

 public string[] Parser(string caminho)
 {
            string text = File.ReadAllText(caminho);
            var Linha = Regex.Replace(text, @"\/\**?\*\/", " ");
            var Commands = Linha.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
               .Where(line => !string.IsNullOrWhiteSpace(line))
               .Where(line => !Regex.IsMatch(line, @"^[\s\-]+$")) 
               .ToArray();
 }

这就是。我添加到"忽略"虚线:

  

.Where(line =>!Regex.IsMatch(line,@" ^ [\ s - ] + $"))

带有破折号的文字示例:

/

---------------------------------------------------------------------

UPDATE CDPREPORTSQL
SET COMANDOSQL_FROM =
'SELECT DESCONTO,EMPCOD,EMPDSC,LINVER,NOMESISTEMA,OBS,ORCCOD,ORCVER,PEDCOD,PEDDSC,
ROUND(PRCUNIT*#CAMBIO#,5) PRCUNIT,
ROUND(PRCUNITSEMDESC*#CAMBIO#,5) PRCUNITSEMDESC,
PROPCHECK,QTDGLOB,QTDPROP,REFCOD,REFDSC,EMPCODVER, COEFGERAL_PLT FROM #OWNER#.VW_PROPOSTAS', 
COMANDOSQL_WHERE = 
'WHERE ORCCOD=#ORCCOD# AND ORCVER=#ORCVER# AND NOMESISTEMA=#NOMESISTEMA# AND PEDCOD=#MYCOD#'
WHERE REPID = 'CDP0000057'
/

---------------------------------------------------------------------

输出样本:

---------------------------------------------------------------------

UPDATE CDPREPORTSQL
SET COMANDOSQL_FROM =
'SELECT DESCONTO,EMPCOD,EMPDSC,LINVER,NOMESISTEMA,OBS,ORCCOD,ORCVER,PEDCOD,PEDDSC,
ROUND(PRCUNIT*#CAMBIO#,5) PRCUNIT,
ROUND(PRCUNITSEMDESC*#CAMBIO#,5) PRCUNITSEMDESC,
PROPCHECK,QTDGLOB,QTDPROP,REFCOD,REFDSC,EMPCODVER, COEFGERAL_PLT FROM #OWNER#.VW_PROPOSTAS', 
COMANDOSQL_WHERE = 
'WHERE ORCCOD=#ORCCOD# AND ORCVER=#ORCVER# AND NOMESISTEMA=#NOMESISTEMA# AND PEDCOD=#MYCOD#'
WHERE REPID = 'CDP0000057'


---------------------------------------------------------------------

这些是可能发生的语句示例,我需要处理:

/*    */
            UPDATE Orc 
/*UPDATE comando */
set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/

另一个:

/*    */
---- comment
            UPDATE Orc set MercadoInt = 'N', Coef_KrMo = 
             -1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/

还有一个:

/*    */
            UPDATE Orc set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
  

请注意,即使语句中间有注释部分,我也需要处理它们   请注意,其他一切工作正常(它"忽略"评论   和空格)

     

' /'只是将命令分成文本文件

5 个答案:

答案 0 :(得分:0)

据我所知,你有一个包含多个SQL命令的文本文件,分别为:

/

---------------------------------------------------------------------

你只想要这些破折号之间的文字。如果是这样,为什么不用Regex.Split分割文本,然后拿出所有元素?

This正则表达式似乎有效:

\/\n\n-+

基于Regex.Split文档,代码为:

string input = File.ReadAllText(caminho);
string pattern = "\/\n\n-+";            

string[] substrings = Regex.Split(input, pattern);
foreach (string match in substrings)
{
   //do cool stuff with your cool query
}

答案 1 :(得分:0)

如果您不想使用正则表达式,您也可以使用!line.TrimStart().StartWith("-"),但我认为它更快。

答案 2 :(得分:0)

以下代码适用于您提供的示例。

    private const string DashComment = @"(^|\s+)--.*(\n|$)";
    private const string SlashStarComment = @"\/\*.*?\*\/";
    private string[] CommandSplitter(string text)
    {
        // strip /* ... */ comments
        var strip1 = Regex.Replace(text, SlashStarComment, " ", RegexOptions.Multiline);
        var strip2 = Regex.Replace(strip1, DashComment, "\n", RegexOptions.Multiline);
        // split into individual commands separated by '/'
        var commands = strip2.Split(new[] {'/'}, StringSplitOptions.RemoveEmptyEntries);

        return commands.Where(line => !String.IsNullOrWhiteSpace(line))
            .ToArray();
    }

我把你在问题中发布的三个例子放在一个字符串中。它看起来像这样(是的,它很难看):

        private const string Test1 = @"/*    */
            UPDATE Orc 
/*UPDATE comando */
set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
/*    */
---- comment
            UPDATE Orc set MercadoInt = 'N', Coef_KrMo = 
             -1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/
/*    */
            UPDATE Orc set MercadoInt = 'N', Coef_KrMo = 1, Coef_KrMt = 1, Coef_KrEq = 1, Coef_KrSb = 1, Coef_KrGb = 1, Coef_MDEmp = 1, Coef_MDLoc = 1, Abrv_MDLoc = '', Dsc_MDLoc = '', Arred_MDLoc = 'N', Arred_NDecs = 0 WHERE MercadoInt IS NULL
/";

然后,我打电话给CommandSplitter

var result = CommandSplitter(Test1);

输出结果:

foreach (var t in result)
{
    Console.WriteLine(t);
    Console.WriteLine("////////////////////////");
}

删除了/* ... */条评论和-- ...评论。

它也适用于这个例子:

    private const string Test2 =
        "Update Orc set /* this is a comment */ MercadoInt = 'N' -- this is another comment\n" +
        "Where MercadoInt is NULL --another comment";

输出:

Update Orc set   MercadoInt = 'N'
Where MercadoInt is NULL

<强>更新 上面的代码返回一个命令数组。每个命令由多行组成。如果要在行的开头删除多余的空格,并消除空行,则必须单独处理每个单独的命令。所以你想像这样扩展CommandSplitter

private string[] CommandSplitter(string text)
{
    // strip /* ... */ comments
    var strip1 = Regex.Replace(text, SlashStarComment, " ", RegexOptions.Multiline);
    var strip2 = Regex.Replace(strip1, DashComment, "\n", RegexOptions.Multiline);
    // split into individual commands separated by '/'
    var commands = strip2.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries);

    return commands.Select(cmd => cmd.Split(new[] {'\n'})
        .Select(l => l.Trim()))
        .Select(lines => string.Join("\n", lines.Where(l => !string.IsNullOrWhiteSpace(l))))
        .ToArray();
}

答案 3 :(得分:0)

这一切似乎相当复杂和缓慢。如果您只想查找/拒绝破折号线,为什么不使用:

if (line.StartsWith("----"))

(假设4个短划线足以明确地检测这些线)

如果该行的开头可能有空格,那么:

if (line.Trim().StartsWith("----"))

这种方法不仅比正则表达式更具可读性,它最有可能更快。

答案 4 :(得分:-1)

我已经完成了这样的代码,到目前为止工作得很好。

 public string[] Parser(string caminho)
        {
            List<string> Commands2 = new List<string>();
            string text = File.ReadAllText(caminho);
            var Linha = Regex.Replace(text, @"\/\**?\*\/", " ");
            var Commands = Linha.Split(new[] { '/' }, StringSplitOptions.RemoveEmptyEntries)
               .Where(line => !string.IsNullOrWhiteSpace(line))
               .Where(line => !Regex.IsMatch(line, @"^[\s\-]+$")) 
               .ToArray();


            Commands2 = Commands.ToList();


          for(int idx = 0; idx < Commands2.Count; idx ++)
            {

                if (Commands2[idx].TrimStart().StartsWith("-"))
                {
                    string linha = Commands2[idx];
                    string linha2 = linha.Remove(linha.IndexOf('-'), linha.LastIndexOf('-') - 1);
                    Commands2[idx] = linha2;
                }



            }
          //test the output to a .txt file
            StreamWriter Comandos = new StreamWriter(Directory.GetParent(caminho).ToString() + "Out.txt", false);
            foreach (string linha in Commands2)
            {
                Comandos.Write(linha);
            }
            Comandos.Close();
            return Commands2.ToArray();
        }
  

在他们分析了我的代码后,他们说我不能使用它(As   上面提到的)因为它不适用于某些情况,如评论   中间的陈述。   我现在尝试使用Tsql120Parser