在c#中查找字符串中两个单词之间的文本

时间:2015-02-22 16:31:05

标签: .net c#-4.0

我的文件包含messages.Message以" A1"开头。并以Z1结束。

以下是文件内容

A1 s = 10 y = 10 z = 120 Z1 VV CCZ

A1 55 77 88 99 Z1 qq KK A1

uuu Z1 A1 LL KK ZZ Z1 SS

现在你可以看到消息被分成多个Line.i需要通过逐行读取.txt文件从文件中提取所有消息。

输出将是字符串消息列表

A1 s = 10 y = 10 z = 120 Z1

A1 55 77 88 99 Z1

A1 uuu Z1

A1 LL KK ZZ Z1

溶液

public void ProcessFile()         {

        string _startingWord = "A1";
        string _endingWord = "Z1";
        bool _waitForlastWord = false;

        StringBuilder msg = new StringBuilder();

        string line;

        // Read the file and display it line by line.
        System.IO.StreamReader file =new System.IO.StreamReader(@"G:\CS Session\Test.txt");
        while ((line = file.ReadLine()) != null)
        {
            var message = line.Split(' ').ToList();
            if (message.Count(x => x == _startingWord) > 0 || message.Count(x => x == _endingWord) > 0 || _waitForlastWord)
            {

               bool startingFound = false;
               if (_waitForlastWord)
                {
                    startingFound = true;
                }


               foreach (var wrd in message)
               {
                   if (!startingFound)
                   {
                       if (wrd == _startingWord)
                       {
                           startingFound = true;
                       }
                   }

                   if (startingFound)
                   {
                       msg.Append(" " + wrd);

                       if (wrd == _endingWord)
                       {
                           startingFound = false;


                           Console.WriteLine(msg.ToString());
                           msg = new StringBuilder();
                       }
                   }
               }
               if (! (msg.ToString()==string.Empty))
               {
                   _waitForlastWord = true;
               }
            }



        }

        file.Close();


        System.Console.ReadLine();
    }

2 个答案:

答案 0 :(得分:1)

您可以使用string.IndexOf方法解决此问题,该方法会在您搜索的关键字的字符串中找到位置。

foreach (var line in lines)
{
    int start = line.IndexOf("A1");
    int end = line.LastIndexOf("Z1") + 2; // add length of keyword.
    if (end > start) 
    {
        int length = end - start; // get the length between the A1 and Z1 positons.
        var result = line.Substring(start, length);
        msg.AppendLine(result);
    }                
}

编辑:错过了开始/结束可能跨越两行,这带来了另一个解决方案

// process all lines first, flatten structure.
string startingWord = "A1";
string endingWord = "Z1";
var contents = File.ReadAllText("path\to\somefile.txt").Replace(Environment.NewLine, "");
var result = contents.Split(new string[] { startingWord }, StringSplitOptions.RemoveEmptyEntries).ToList();

foreach (var line in result)
{
    int position = line.LastIndexOf(endingWord);
    if (position > -1)
    {
        int end = position + endingWord.Length;
        Console.WriteLine("{0}{1}", startingWord, line.Substring(0, end));
    }
}

使用File.ReadAllText将整个文件读取为单个字符串,并Replace Environment.NewLine(\ r \ n)的所有出现。然后使用起始关键字(A1)拆分字符串,将其转换为具有给定结果的数组。由于我们通过关键字进行拆分,因此我们需要在某个时刻读取以获得预期的输出。

答案 1 :(得分:0)

使用正则表达式匹配模式在这里很有用:A1(。*?)Z1 - 懒惰地匹配以A1开头并以Z1结尾的字符串,其中包含任意数量的字符。

  var regExp = new Regex("A1(.*?)Z1");

  foreach(var match in regExp.Matches(File.ReadAllText("test.txt")
                       .Replace("\r\n", " ")))
  {
      Console.WriteLine(match);
  }

假设新行是\ r \ n的组合,并替换为空格以考虑预期输出中A1之后的空格