从文件中搜索特定数据

时间:2009-05-02 14:47:24

标签: c#

我有一个文件,文字少,数字少。我只是想从中提取数字。我该怎么做呢?

我尝试使用所有分裂的东西但到目前为止没有运气。 我的档案是这样的:

AT + CMGL = “ALL” + CMGL:5566,“REC READ”,“Ufone” 尊敬的客户,您的DAY_BUCKET订阅将于02/05/09到期 + CMGL:5565,“REC READ”,“+ 923466666666”

请告诉我从这个文件中提取数字+923466666666的方法,这样我就可以把它们放到另一个文件或文本框中。

由于

3 个答案:

答案 0 :(得分:2)

如果数字全部位于行的末尾,则可以使用如下代码

foreach ( string line in File.ReadAllLines(@"c:\path\to\file.txt") ) {
  Match result = Regex.Match(line, @"\+(\d+)""$");
  if ( result.Success ) { 
    var number = result.Groups[1].Value;
    // do what you want with the number
  }
}

答案 1 :(得分:2)

这是使用String.Split的示例。 “数字”包含一个“+”,所以它应该被视为字符串而不是数字。我假设这是一个可能用于国际电话的“+”电话号码?如果是电话号码,您需要注意短划线,数字中的空格以及添加到末尾的分机号码,例如“+9234 666-66666 ext 235”等等......

无论如何 - 希望这个例子对于掌握Split很有用。

代码包括使用NUnit v2.4.8的单元测试

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using System.Text.RegularExpressions;

namespace SO.NumberExtractor.Test
{
    public class NumberExtracter
    {
        public List<string> ExtractNumbers(string lines)
        {
            List<string> numbers = new List<string>();
            string[] seperator = { System.Environment.NewLine };
            string[] seperatedLines = lines.Split(seperator, StringSplitOptions.RemoveEmptyEntries);

            foreach (string line in seperatedLines)
            {
                string s = ExtractNumber(line);
                numbers.Add(s);
            }

            return numbers;
        }

        public string ExtractNumber(string line)
        {
            string s = line.Split(',').Last<string>().Trim('"');
            return s;
        }

        public string ExtractNumberWithoutLinq(string line)
        {
            string[] fields = line.Split(',');
            string s = fields[fields.Length - 1];
            s = s.Trim('"');

            return s;
        }
    }

    [TestFixture]
    public class NumberExtracterTest
    {
        private readonly string LINE1 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666666\"";
        private readonly string LINE2 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666667\"";
        private readonly string LINE3 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666668\"";

        [Test]
        public void ExtractOneLineWithoutLinq()
        {            
            string expected = "+923466666666";

            NumberExtracter c = new NumberExtracter();
            string result = c.ExtractNumberWithoutLinq(LINE1);

            Assert.AreEqual(expected, result);            
        }

        [Test]
        public void ExtractOneLineUsingLinq()
        {
            string expected = "+923466666666";

            NumberExtracter c = new NumberExtracter();
            string result = c.ExtractNumber(LINE1);

            Assert.AreEqual(expected, result);
        }

        [Test]
        public void ExtractMultipleLines()
        {
            StringBuilder sb = new StringBuilder();
            sb.AppendLine(LINE1);
            sb.AppendLine(LINE2);
            sb.AppendLine(LINE3);

            NumberExtracter ne = new NumberExtracter();
            List<string> extractedNumbers = ne.ExtractNumbers(sb.ToString());

            string expectedFirst = "+923466666666";
            string expectedSecond = "+923466666667";
            string expectedThird = "+923466666668";

            Assert.AreEqual(expectedFirst, extractedNumbers[0]);
            Assert.AreEqual(expectedSecond, extractedNumbers[1]);
            Assert.AreEqual(expectedThird, extractedNumbers[2]);
        }
    } 
}

答案 2 :(得分:1)

文件有多大?如果文件大小只有几兆字节,我建议将文件内容加载到字符串中,并使用编译的正则表达式来提取匹配项。

这是一个简单的例子:

    Regex NumberExtractor = new Regex("[0-9]{7,16}",RegexOptions.Compiled);

    /// <summary>
    /// Extracts numbers between seven and sixteen digits long from the target file.
    /// Example number to be extracted: +923466666666
    /// </summary>
    /// <param name="TargetFilePath"></param>
    /// <returns>List of the matching numbers</returns>
    private IEnumerable<ulong> ExtractLongNumbersFromFile(string TargetFilePath)
    {

        if (String.IsNullOrEmpty(TargetFilePath))
            throw new ArgumentException("TargetFilePath is null or empty.", "TargetFilePath");

        if (File.Exists(TargetFilePath) == false) 
            throw new Exception("Target file does not exist!");

        FileStream TargetFileStream = null;
        StreamReader TargetFileStreamReader = null; 
        string FileContents = "";
        List<ulong> ReturnList = new List<ulong>();

        try
        {
            TargetFileStream = new FileStream(TargetFilePath, FileMode.Open);
            TargetFileStreamReader = new StreamReader(TargetFileStream);
            FileContents = TargetFileStreamReader.ReadToEnd();

            MatchCollection Matches = NumberExtractor.Matches(FileContents);

            foreach (Match CurrentMatch in Matches) {
                ReturnList.Add(System.Convert.ToUInt64(CurrentMatch.Value));
            }

        }
        catch (Exception ex)
        {
            //Your logging, etc...
        }
        finally
        {
            if (TargetFileStream != null) {
                TargetFileStream.Close();
                TargetFileStream.Dispose();
            }

            if (TargetFileStreamReader != null)
            {
                TargetFileStreamReader.Dispose();
            }
        }

        return (IEnumerable<ulong>)ReturnList;


    }

样本用法:

List<ulong> Numbers = (List<ulong>)ExtractLongNumbersFromFile(@"v:\TestExtract.txt");