Question

以免我患上一些颤抖的颤抖（大约在杜鹃和阿特拉斯之间的交叉）腕管综合症），我需要找到一种自动解析大文件的方法 sql语句及其参数值。

我有一个文件，其格式为sql语句：

select Animal#, RacketThreshold, PeakOil as Oil
from OilAnimalPlatypus2
where OilAnimalPlatypusID = :ID
  and Animal# = :Animal
  and TelecasterAccessType = 'D'
UNION
select Animal, RacketThreshold, PeakOil as Oil
from OilRequestPlatypus
where PlatypusID = :ID
  and Animal = :Animal
order by RacketThreshold

-->ID(VARCHAR[0])=<NULL> 
:Animal(INTEGER)=2

...即多行sql语句后跟一个空行后跟两个破折号和带有参数名称，数据类型和参数的箭头，后跟相同的内容无限期地无休止地广告（除了sql语句没有参数的地方）。

我想从这个伟大的goo gob中为每个独特的查询创建一个单独的字符串（很多它们是相同的，虽然通常会分配不同的参数值 PARAMS）。如果可能的话，我还想跟踪传递给特定查询的所有参数值（例如，如果它是第一次被调用并且为特定参数传递“1”，则下一次是“42”，下一次“3.14”等），我希望这个arg名称的集合为1,42,3.14。

有超过400个查询，我不喜欢“手动”完成所有操作 - 特别是比较查询匹配。

已更新

好的，在添加此代码后使用Jon的：

private void buttonOpenAndParseSQLMonFile_Click(object sender, EventArgs e)
{
    var queriesAndArgs = (Dictionary<string, List<string>>)ParseFile("SQLMonTraceLog.txt");
    foreach(var pair in queriesAndArgs)
    {
        richTextBoxParsedResults.AppendText(pair.Key);
        richTextBoxParsedResults.AppendText(Environment.NewLine);
        foreach (String s in pair.Value)
        {
            richTextBoxParsedResults.AppendText(s);
            richTextBoxParsedResults.AppendText(Environment.NewLine);
        }
        richTextBoxParsedResults.AppendText(Environment.NewLine);
    }
}

...我在我的richTextBox中得到了这些类型的结果：

select ABCID from ABCWorker where lower(loginid) = lower(user) 


select r.roleid from abcrole r, abcworker w where lower(w.loginid)=lower(user)   and r.abcid=w.abcid   and r.status='A'


select Tier#, BenGrimm, PeakRate as Ratefrom RageAnimalGreenBayPackers2 where RageAnimalGreenBayPackersID = :ID and Tier# = 

:Tier and FlyingVAccessType = 'D' UNION select Tier, BenGrimm, PeakRate as Rate from CaliforniaCondorGreenBayPackers where 

GreenBayPackersID = :ID and Tier = :Tier order by BenGrimm 
-->   :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=4 


select Tier#, BenGrimm, PeakRate as Rate from RageAnimalGreenBayPackers2 where RageAnimalGreenBayPackersID = :ID and Tier# = 

:Tier and FlyingVAccessType = 'D' UNION select Tier, BenGrimm, PeakRate as Rate from CaliforniaCondorGreenBayPackers where 

GreenBayPackersID = :ID and Tier = :Tier order by BenGrimm 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=2 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=5 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=1 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=2 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=3 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=4 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=2 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=3 
-->  :ID(VARCHAR[0])=<NULL> :Tier(INTEGER)=4 
(etc.)

...所以，这非常有启发性，但我发现它不是我需要的东西，而且还取决于我的lamo手工调整文件。所以，我想我需要退后一步解析文件，因为它实际上是给我的，每个“有趣”的事件都有递增的数字：

. . .
6       11:30:46  SQL Execute: select ABCID
from ABCWorker
where lower(loginid) = lower(user)
7       11:30:46  SQL Prepare: select r.roleid from abcrole r, abcworker w where lower(w.loginid)=lower(user)   and     
r.abcid=w.abcid   and r.status='A'
8       11:30:46  SQL Execute: select r.roleid from abcrole r, abcworker w where lower(w.loginid)=lower(user)   and     
r.abcid=w.abcid   and r.status='A'
9       11:30:46  SQL Execute: select Tier#, BenGrimm, PeakRate as Rate
from RageAnimalGreenBayPackers2
where RageAnimalGreenBayPackersID = :ID
  and Tier# = :Tier
  and FlyingVAccessType = 'D'
UNION
select Tier, BenGrimm, PeakRate as Rate
from CaliforniaCondorGreenBayPackers
where GreenBayPackersID = :ID
  and Tier = :Tier
order by BenGrimm
10      11:30:46  :ID(VARCHAR[0])=<NULL> 
:Tier(INTEGER)=1
11      11:30:46  SQL Execute: select Tier#, BenGrimm, PeakRate as Rate
from RageAnimalGreenBayPackers2
where RageAnimalGreenBayPackersID = :ID
  and Tier# = :Tier
  and FlyingVAccessType = 'D'
UNION
select Tier, BenGrimm, PeakRate as Rate
from CaliforniaCondorGreenBayPackers
where GreenBayPackersID = :ID
  and Tier = :Tier
order by BenGrimm
12      11:30:46  :ID(VARCHAR[0])=<NULL> 
:Tier(INTEGER)=2
. . .

Answer 1

你真正需要的是一个词法分析器。查看ANTLR - http://www.antlr.org/

您需要定义“语法”，即语言的每个元素的特征（在这种情况下是您的SQL文件）。然后最后，ANTLR处理你的文件并根据我们的语法定义吐出结果。

这只是一个标记化和解析过程。

Answer 2

这是我评论的一个具体例子;您可以通过使用StreamReader完成此操作并将每个块收集到List中;例如：

string line = String.Empty;

List<String> statementBlocks = new List<String>();

System.IO.StreamReader file = new System.IO.StreamReader("C:\\temp\\annoying_text_file.sql");

StringBuilder blockCollector = new StringBuilder();

//read the file a line at a time
while((line = file.ReadLine()) != null)
{
  //If the line has content, then we append it to our string builder 
  if(!String.IsNullOrWhitespace(line)) //String.IsNullOrWhitespace is new in .Net 4 and will also match the new line
  {
      blockCollector.AppendLine(line);
  }
  else
  {
       //we've hit a blank line - dump it to the list and reinitialize the stringbuilder
       statementBlocks.Add(blockCollector.ToString();
       statementBlocks = new StringBuilder();
  }

}

//Tidy up
file.Close();

foreach(string statementBlock in statementBlocks)
{
  if(!String.IsNullOrEmpty(statementBlock))
  {
      if(statememtBlock.StartsWith("-->"))
      {
        //Code to split out the arguments; if they are delimited with : then you can just string.split this line
        //string[] paramsAndValues = line.Replace("-->", String.Empty).Split(Char.Parse(":"))
        // then for each string in here it's paramName(DataType)=Value, which is also splittable.
      }
      else
      {
      //Do whatever you want with this valid block (including writing it to another file!)
      //To keep only the unique ones, store each block in a list, then look to see if a block already exists in the list each time; if it does, just skip this block. Given you also know that the next block will be a parameter block, you can also collect the parameters here too
      }
  }    
}

我现在无法检查此编译，但它应该让您对可能的方法做出一般意识。

假设只有空行是语句块之间的空行。

Answer 3

假设您正在通过另一个空行将查询彼此分开，您可以尝试使用以下内容来解析文件。代码将读取文件直到结束。每次调用parseQuery都将读取行，直到找到一个空行，并将它们作为查询附加在一起。然后它将检查下一行，如果它不是参数块的开头，它将保存没有参数的查询，并重新开始，假设它在另一个查询的开头。如果该行是参数块的开头，则代码将读取，直到它到达另一个空行，保存查询及其参数，然后返回。 while（parseQuery）将确保整个文件被解析。

最后，代码吐出一个包含查询字符串作为键的字典，以及一个字符串列表作为提供的不同参数。为简单起见，省略了错误检查。在实际场景中，您需要为文件不存在等事物添加处理。

static IDictionary<string, List<string>> ParseFile(string path)
{
    Dictionary<string, List<string>> queries = new Dictionary<string, List<string>>();
    using (var reader = File.OpenText(path))
    {
        while (parseQuery(reader, queries)) { }
    }
    return queries;
}

private static bool parseQuery(StreamReader reader, Dictionary<string, List<string>> queries)
{
    StringBuilder sbQuery = new StringBuilder();
    StringBuilder sbArgs = new StringBuilder();
    // Read in query
    bool moreLines = ParseBlock(reader, sbQuery);
    if (moreLines)
    {
        while (moreLines)
        {
            string line = reader.ReadLine();
            // Check for the beginning of an args block.
            if (line != null && line.StartsWith("-->"))
            {
                // Read in args
                sbArgs.Append(line);
                moreLines = ParseBlock(reader, sbArgs);
                break;
            }
            // If this is not an args block, it is a new query
            // Save the last query and start over
            else
            {
                AddQuery(queries, sbQuery.ToString(), sbArgs.ToString());
                sbQuery = new StringBuilder();
                sbQuery.Append(line); // Make sure we capture the last line
                moreLines = ParseBlock(reader, sbQuery);
            }
        }
    }
    AddQuery(queries, sbQuery.ToString(), sbArgs.ToString());
    return moreLines;
}

private static bool ParseBlock(StreamReader reader, StringBuilder builder)
{
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        line = line.Trim();
        if (string.IsNullOrWhiteSpace(line)) break;

        builder.Append(line + " ");
    }
    return line != null;
}

private static void AddQuery(Dictionary<string, List<string>> queries, string query, string args)
{
    if (query.Length > 0)
    {
        List<string> lstParams;
        if (!queries.TryGetValue(query, out lstParams))
        {
            lstParams = new List<string>();
        }
        lstParams.Add(args);
        queries[query] = lstParams;
    }
}

这个文本文件是可解析的还是可以解决的？

已更新

3 个答案: