C#是否内置支持解析页码字符串?

时间:2008-09-02 17:53:54

标签: c# parsing

C#是否内置支持解析页码字符串?按页码,我的意思是你可以输入到逗号和破折号分隔的打印对话框中的格式。

这样的事情:

1,3,5-10,12

什么是非常好的解决方案,它给了我一些由字符串表示的所有页码的列表。在上面的例子中,像这样获得一个列表会很好:

1,3,5,6,7,8,9,10,12

如果有一个简单的方法,我只想避免自己动手。

11 个答案:

答案 0 :(得分:22)

应该很简单:

foreach( string s in "1,3,5-10,12".Split(',') ) 
{
    // try and get the number
    int num;
    if( int.TryParse( s, out num ) )
    {
        yield return num;
        continue; // skip the rest
    }

    // otherwise we might have a range
    // split on the range delimiter
    string[] subs = s.Split('-');
    int start, end;

    // now see if we can parse a start and end
    if( subs.Length > 1 &&
        int.TryParse(subs[0], out start) &&
        int.TryParse(subs[1], out end) &&
        end >= start )
    {
        // create a range between the two values
        int rangeLength = end - start + 1;
        foreach(int i in Enumerable.Range(start, rangeLength))
        {
            yield return i;
        }
    }
}

修改:感谢您的修复;-)

答案 1 :(得分:8)

它没有内置的方法来执行此操作,但使用String.Split会很简单。

只需拆分','即可获得一系列代表页码或范围的字符串。迭代该系列并执行String.Split' - '。如果没有结果,那么它是一个普通页码,所以请将其粘贴在您的页面列表中。如果有结果,请将' - '的左侧和右侧作为边界,并使用一个简单的for循环将每个页码添加到该范围内的最终列表中。

不能花5分钟做,然后可能另外10个添加一些健全性检查以在用户尝试输入无效数据时抛出错误(如“1-2-3”或其他东西。)

答案 2 :(得分:5)

凯斯的方法看起来不错。我使用列表组合了一种更天真的方法。这有错误检查,所以希望能解决大多数问题: -

public List<int> parsePageNumbers(string input) {
  if (string.IsNullOrEmpty(input))
    throw new InvalidOperationException("Input string is empty.");

  var pageNos = input.Split(',');

  var ret = new List<int>();
  foreach(string pageString in pageNos) {
    if (pageString.Contains("-")) {
      parsePageRange(ret, pageString);
    } else {
      ret.Add(parsePageNumber(pageString));
    }
  }

  ret.Sort();
  return ret.Distinct().ToList();
}

private int parsePageNumber(string pageString) {
  int ret;

  if (!int.TryParse(pageString, out ret)) {
    throw new InvalidOperationException(
      string.Format("Page number '{0}' is not valid.", pageString));
  }

  return ret;
}

private void parsePageRange(List<int> pageNumbers, string pageNo) {
  var pageRange = pageNo.Split('-');

  if (pageRange.Length != 2)
    throw new InvalidOperationException(
      string.Format("Page range '{0}' is not valid.", pageNo));

  int startPage = parsePageNumber(pageRange[0]),
    endPage = parsePageNumber(pageRange[1]);

  if (startPage > endPage) {
    throw new InvalidOperationException(
      string.Format("Page number {0} is greater than page number {1}" +
      " in page range '{2}'", startPage, endPage, pageNo));
  }

  pageNumbers.AddRange(Enumerable.Range(startPage, endPage - startPage + 1));
}

答案 3 :(得分:4)

下面是我刚刚拼凑起来的代码..你可以输入的格式如... 1-2,5abcd,6,7,20-15 ,,,,,,

易于添加其他格式

    private int[] ParseRange(string ranges)
    { 
        string[] groups = ranges.Split(',');
        return groups.SelectMany(t => GetRangeNumbers(t)).ToArray();
    }

    private int[] GetRangeNumbers(string range)
    {
        //string justNumbers = new String(text.Where(Char.IsDigit).ToArray());

        int[] RangeNums = range
            .Split('-')
            .Select(t => new String(t.Where(Char.IsDigit).ToArray())) // Digits Only
            .Where(t => !string.IsNullOrWhiteSpace(t)) // Only if has a value
            .Select(t => int.Parse(t)) // digit to int
            .ToArray();
        return RangeNums.Length.Equals(2) ? Enumerable.Range(RangeNums.Min(), (RangeNums.Max() + 1) - RangeNums.Min()).ToArray() : RangeNums;
    }

答案 4 :(得分:2)

这是我为类似的东西做的事情。

它处理以下类型的范围:

1        single number
1-5      range
-5       range from (firstpage) up to 5
5-       range from 5 up to (lastpage)
..       can use .. instead of -
;,       can use both semicolon, comma, and space, as separators

它不检查重复值,因此集合 1,5,-10 将生成序列 1,5,1,2,3,4,5,6, 7,8,9,10

public class RangeParser
{
    public static IEnumerable<Int32> Parse(String s, Int32 firstPage, Int32 lastPage)
    {
        String[] parts = s.Split(' ', ';', ',');
        Regex reRange = new Regex(@"^\s*((?<from>\d+)|(?<from>\d+)(?<sep>(-|\.\.))(?<to>\d+)|(?<sep>(-|\.\.))(?<to>\d+)|(?<from>\d+)(?<sep>(-|\.\.)))\s*$");
        foreach (String part in parts)
        {
            Match maRange = reRange.Match(part);
            if (maRange.Success)
            {
                Group gFrom = maRange.Groups["from"];
                Group gTo = maRange.Groups["to"];
                Group gSep = maRange.Groups["sep"];

                if (gSep.Success)
                {
                    Int32 from = firstPage;
                    Int32 to = lastPage;
                    if (gFrom.Success)
                        from = Int32.Parse(gFrom.Value);
                    if (gTo.Success)
                        to = Int32.Parse(gTo.Value);
                    for (Int32 page = from; page <= to; page++)
                        yield return page;
                }
                else
                    yield return Int32.Parse(gFrom.Value);
            }
        }
    }
}

答案 5 :(得分:1)

在有测试用例之前,您无法确定。在我的情况下,我宁愿是空格分隔而不是逗号分隔。它使解析变得更复杂。

    [Fact]
    public void ShouldBeAbleToParseRanges()
    {
        RangeParser.Parse( "1" ).Should().BeEquivalentTo( 1 );
        RangeParser.Parse( "-1..2" ).Should().BeEquivalentTo( -1,0,1,2 );

        RangeParser.Parse( "-1..2 " ).Should().BeEquivalentTo( -1,0,1,2 );
        RangeParser.Parse( "-1..2 5" ).Should().BeEquivalentTo( -1,0,1,2,5 );
        RangeParser.Parse( " -1  ..  2 5" ).Should().BeEquivalentTo( -1,0,1,2,5 );
    }

请注意,在范围令牌之间存在空格的最后一次测试中,Keith的答案(或小变化)将失败。这需要一个标记器和一个具有前瞻性的适当解析器。

namespace Utils
{
    public class RangeParser
    {

        public class RangeToken
        {
            public string Name;
            public string Value;
        }

        public static IEnumerable<RangeToken> Tokenize(string v)
        {
            var pattern =
                @"(?<number>-?[1-9]+[0-9]*)|" +
                @"(?<range>\.\.)";

            var regex = new Regex( pattern );
            var matches = regex.Matches( v );
            foreach (Match match in matches)
            {
                var numberGroup = match.Groups["number"];
                if (numberGroup.Success)
                {
                    yield return new RangeToken {Name = "number", Value = numberGroup.Value};
                    continue;
                }
                var rangeGroup = match.Groups["range"];
                if (rangeGroup.Success)
                {
                    yield return new RangeToken {Name = "range", Value = rangeGroup.Value};
                }

            }
        }

        public enum State { Start, Unknown, InRange}

        public static IEnumerable<int> Parse(string v)
        {

            var tokens = Tokenize( v );
            var state = State.Start;
            var number = 0;

            foreach (var token in tokens)
            {
                switch (token.Name)
                {
                    case "number":
                        var nextNumber = int.Parse( token.Value );
                        switch (state)
                        {
                            case State.Start:
                                number = nextNumber;
                                state = State.Unknown;
                                break;
                            case State.Unknown:
                                yield return number;
                                number = nextNumber;
                                break;
                            case State.InRange:
                                int rangeLength = nextNumber - number+ 1;
                                foreach (int i in Enumerable.Range( number, rangeLength ))
                                {
                                    yield return i;
                                }
                                state = State.Start;
                                break;
                            default:
                                throw new ArgumentOutOfRangeException();
                        }
                        break;
                    case "range":
                        switch (state)
                        {
                            case State.Start:
                                throw new ArgumentOutOfRangeException();
                                break;
                            case State.Unknown:
                                state = State.InRange;
                                break;
                            case State.InRange:
                                throw new ArgumentOutOfRangeException();
                                break;
                            default:
                                throw new ArgumentOutOfRangeException();
                        }
                        break;
                    default:
                        throw new ArgumentOutOfRangeException( nameof( token ) );
                }
            }
            switch (state)
            {
                case State.Start:
                    break;
                case State.Unknown:
                    yield return number;
                    break;
                case State.InRange:
                    break;
                default:
                    throw new ArgumentOutOfRangeException();
            }
        }
    }
}

答案 6 :(得分:1)

SplitLinq

的一线方法
string input = "1,3,5-10,12";
IEnumerable<int> result = input.Split(',').SelectMany(x => x.Contains('-') ? Enumerable.Range(int.Parse(x.Split('-')[0]), int.Parse(x.Split('-')[1]) - int.Parse(x.Split('-')[0]) + 1) : new int[] { int.Parse(x) });

答案 7 :(得分:0)

这是一个稍微修改过的lassevk代码版本,用于处理Regex匹配中的string.Split操作。它是作为扩展方法编写的,您可以使用LINQ的Disinct()扩展轻松处理重复问题。

    /// <summary>
    /// Parses a string representing a range of values into a sequence of integers.
    /// </summary>
    /// <param name="s">String to parse</param>
    /// <param name="minValue">Minimum value for open range specifier</param>
    /// <param name="maxValue">Maximum value for open range specifier</param>
    /// <returns>An enumerable sequence of integers</returns>
    /// <remarks>
    /// The range is specified as a string in the following forms or combination thereof:
    /// 5           single value
    /// 1,2,3,4,5   sequence of values
    /// 1-5         closed range
    /// -5          open range (converted to a sequence from minValue to 5)
    /// 1-          open range (converted to a sequence from 1 to maxValue)
    /// 
    /// The value delimiter can be either ',' or ';' and the range separator can be
    /// either '-' or ':'. Whitespace is permitted at any point in the input.
    /// 
    /// Any elements of the sequence that contain non-digit, non-whitespace, or non-separator
    /// characters or that are empty are ignored and not returned in the output sequence.
    /// </remarks>
    public static IEnumerable<int> ParseRange2(this string s, int minValue, int maxValue) {
        const string pattern = @"(?:^|(?<=[,;]))                      # match must begin with start of string or delim, where delim is , or ;
                                 \s*(                                 # leading whitespace
                                 (?<from>\d*)\s*(?:-|:)\s*(?<to>\d+)  # capture 'from <sep> to' or '<sep> to', where <sep> is - or :
                                 |                                    # or
                                 (?<from>\d+)\s*(?:-|:)\s*(?<to>\d*)  # capture 'from <sep> to' or 'from <sep>', where <sep> is - or :
                                 |                                    # or
                                 (?<num>\d+)                          # capture lone number
                                 )\s*                                 # trailing whitespace
                                 (?:(?=[,;\b])|$)                     # match must end with end of string or delim, where delim is , or ;";

        Regex regx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);

        foreach (Match m in regx.Matches(s)) {
            Group gpNum = m.Groups["num"];
            if (gpNum.Success) {
                yield return int.Parse(gpNum.Value);

            } else {
                Group gpFrom = m.Groups["from"];
                Group gpTo = m.Groups["to"];
                if (gpFrom.Success || gpTo.Success) {
                    int from = (gpFrom.Success && gpFrom.Value.Length > 0 ? int.Parse(gpFrom.Value) : minValue);
                    int to = (gpTo.Success && gpTo.Value.Length > 0 ? int.Parse(gpTo.Value) : maxValue);

                    for (int i = from; i <= to; i++) {
                        yield return i;
                    }
                }
            }
        }
    }

答案 8 :(得分:0)

我想出的答案:

static IEnumerable<string> ParseRange(string str)
{
    var numbers = str.Split(',');

    foreach (var n in numbers)
    {
       if (!n.Contains("-")) 
           yield return n;
       else
       {
           string startStr = String.Join("", n.TakeWhile(c => c != '-'));
           int startInt = Int32.Parse(startStr);

           string endStr = String.Join("", n.Reverse().TakeWhile(c => c != '-').Reverse());
           int endInt = Int32.Parse(endStr);

           var range = Enumerable.Range(startInt, endInt - startInt + 1)
                                 .Select(num => num.ToString());

           foreach (var s in range)
               yield return s;
        }
    }
}

答案 9 :(得分:0)

正则表达式效率不如以下代码。字符串方法比Regex更有效,应尽可能使用。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] inputs = {
                                 "001-005/015",
                                 "009/015"
                             };

            foreach (string input in inputs)
            {
                List<int> numbers = new List<int>();
                string[] strNums = input.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
                foreach (string strNum in strNums)
                {
                    if (strNum.Contains("-"))
                    {
                        int startNum = int.Parse(strNum.Substring(0, strNum.IndexOf("-")));
                        int endNum = int.Parse(strNum.Substring(strNum.IndexOf("-") + 1));
                        for (int i = startNum; i <= endNum; i++)
                        {
                            numbers.Add(i);
                        }
                    }
                    else
                        numbers.Add(int.Parse(strNum));
                }
                Console.WriteLine(string.Join(",", numbers.Select(x => x.ToString())));
            }
            Console.ReadLine();

        }
    }
}

答案 10 :(得分:0)

我的解决方案:

  • 返回整数列表
  • 倒转/错字/重复可能:1,-3,5-,7-10,12-9 => 1,3,5,7,8,9,10,12,11,10,9(二手当您要提取时,重复页面)
  • 用于设置页面总数的选项:1,-3,5-,7-10,12-9(Nmax = 9)=> 1,3,5,7,8,9,9
  • 自动完成:1,-3,5-,8(Nmax = 9)=> 1,3,5,6,7,8,9,8

    DEBIAN_FRONTEND=noninteractive apt-get install -y pkgs....