Regex可以剥离的东西,例如字符串左边的“注意:”和“firstName:”?

时间:2010-06-01 16:07:14

标签: c# regex string

我需要从字符串前面剥去“标签”,例如

  

注意:这是一个注释

需要返回:

  

请注意

  

这是一张便条

我已经生成了以下代码示例,但是正在使用正则表达式。

我需要哪些代码?????????以下区域,以便我在评论中显示所需的结果?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace TestRegex8822
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lines = new List<string>();
            lines.Add("note: this is a note");
            lines.Add("test:    just a test");
            lines.Add("test:\t\t\tjust a test");
            lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space
            lines.Add("She said this to him: follow me."); //this is NOT a label since there is a space before the colon
            lines.Add("description: this is the first description");
            lines.Add("description:this is the second description"); //no space after colon
            lines.Add("this is a line with no label");

            foreach (var line in lines)
            {
                Console.WriteLine(StringHelpers.GetLabelFromLine(line));
                Console.WriteLine(StringHelpers.StripLabelFromLine(line));
                Console.WriteLine("--");
                //note
                //this is a note
                //--
                //test
                //just a test
                //--
                //test
                //just a test
                //--
                //firstName
                //Jim
                //--
                //
                //She said this to him: follow me.
                //--
                //description
                //this is the first description
                //--
                //description
                //this is the first description
                //--
                //
                //this is a line with no label
                //--

            }
            Console.ReadLine();
        }
    }

    public static class StringHelpers
    {
        public static string GetLabelFromLine(this string line)
        {
            string label = line.GetMatch(@"^?:(\s)"); //???????????????
            if (!label.IsNullOrEmpty())
                return label;
            else
                return "";
        }

        public static string StripLabelFromLine(this string line)
        {
            return ...//???????????????
        }

        public static bool IsNullOrEmpty(this string line)
        {
            return String.IsNullOrEmpty(line);
        }
    }

    public static class RegexHelpers
    {
        public static string GetMatch(this string text, string regex)
        {
            Match match = Regex.Match(text, regex);
            if (match.Success)
            {
                string theMatch = match.Groups[0].Value;
                return theMatch;
            }
            else
            {
                return null;
            }
        }
    }
}

@Keltex,我将您的想法合并如下,但它不匹配任何文本(所有条目都是空白的),我需要在正则表达式中调整什么?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace TestRegex8822
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lines = new List<string>();
            lines.Add("note: this is a note");
            lines.Add("test:    just a test");
            lines.Add("test:\t\t\tjust a test");
            lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space
            lines.Add("first name: Jim"); //"first name" is not a label because it contains a space
            lines.Add("description: this is the first description");
            lines.Add("description:this is the second description"); //no space after colon
            lines.Add("this is a line with no label");

            foreach (var line in lines)
            {
                LabelLinePair llp = line.GetLabelLinePair();
                Console.WriteLine(llp.Label);
                Console.WriteLine(llp.Line);
                Console.WriteLine("--");
            }
            Console.ReadLine();
        }
    }

    public static class StringHelpers
    {
        public static LabelLinePair GetLabelLinePair(this string line)
        {
            Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
            Match match = regex.Match(line); 
            LabelLinePair labelLinePair = new LabelLinePair();
            labelLinePair.Label = match.Groups["label"].ToString();
            labelLinePair.Line = match.Groups["line"].ToString();
            return labelLinePair;
        }
    }

    public class LabelLinePair
    {
        public string Label { get; set; }
        public string Line { get; set; }
    }

}

解决:

好的,我得到了它的工作,加上一点点黑客来处理带有空格的标签,这正是我想要的,谢谢!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace TestRegex8822
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> lines = new List<string>();
            lines.Add("note: this is a note");
            lines.Add("test:    just a test");
            lines.Add("test:\t\t\tjust a test");
            lines.Add("firstName: Jim"); //"firstName" IS a label because it does NOT contain a space
            lines.Add("first name: Jim"); //"first name" is not a label because it contains a space
            lines.Add("description: this is the first description");
            lines.Add("description:this is the second description"); //no space after colon
            lines.Add("this is a line with no label");
            lines.Add("she said to him: follow me");

            foreach (var line in lines)
            {
                LabelLinePair llp = line.GetLabelLinePair();
                Console.WriteLine(llp.Label);
                Console.WriteLine(llp.Line);
                Console.WriteLine("--");
            }
            Console.ReadLine();
        }
    }

    public static class StringHelpers
    {
        public static LabelLinePair GetLabelLinePair(this string line)
        {
            Regex regex = new Regex(@"(?<label>.+):\s*(?<text>.+)");
            Match match = regex.Match(line); 
            LabelLinePair llp = new LabelLinePair();
            llp.Label = match.Groups["label"].ToString();
            llp.Line = match.Groups["text"].ToString();

            if (llp.Label.IsNullOrEmpty() || llp.Label.Contains(" "))
            {
                llp.Label = "";
                llp.Line = line;
            }

            return llp;
        }

        public static bool IsNullOrEmpty(this string line)
        {
            return String.IsNullOrEmpty(line);
        }
    }

    public class LabelLinePair
    {
        public string Label { get; set; }
        public string Line { get; set; }
    }

}

3 个答案:

答案 0 :(得分:5)

难道你不能简单地将字符串拆分在第一个冒号上,或者如果没有冒号就没有标签吗?

public static class StringHelpers 
{ 
    public static string GetLabelFromLine(this string line) 
    { 
         int separatorIndex = line.IndexOf(':');
         if (separatorIndex > 0)
         {
            string possibleLabel = line.Substring(0, separatorIndex).Trim();
            if(possibleLabel.IndexOf(' ') < 0) 
            {
                return possibleLabel;
            }
         }
         else
         {
            return string.Empty;
         }        
     } 

    public static string StripLabelFromLine(this string line) 
    { 
        int separatorIndex = line.IndexOf(':');
         if (separatorIndex > 0)
         {
            return line.Substring(separatorIndex + 1, 
                   line.Length - separatorIndex - 1).Trim();
         }
         else
         {
            return line;
         }      
    } 

    public static bool IsNullOrEmpty(this string line) 
    { 
        return String.IsNullOrEmpty(line); 
    } 
} 

答案 1 :(得分:3)

它可能看起来像这样:

Regex myreg = new Regex(@"(?<label>.+):\s*(?<text>.+)");

Match mymatch = myreg.Match(text); 

if(mymatch.IsMatch) 
{ 
    Console.WriteLine("label: "+mymatch.Groups["label"]); 
    Console.WriteLine("text: "+mymatch.Groups["text"]); 
}

我上面使用了命名匹配,但你可以不使用它们。另外,我认为这比进行两次方法调用更有效。一个正则表达式同时获得文本和标签。

答案 2 :(得分:1)

此正则表达式有效(see it in action on rubular):

(?: *([^:\s]+) *: *)?(.+)

这会将标签(如果有)捕获到\1,将正文捕获到\2

它有足够的空格容差,因此标签可以缩进等等。