我有一个文件,文字少,数字少。我只是想从中提取数字。我该怎么做呢?
我尝试使用所有分裂的东西但到目前为止没有运气。 我的档案是这样的:
AT + CMGL = “ALL” + CMGL:5566,“REC READ”,“Ufone” 尊敬的客户,您的DAY_BUCKET订阅将于02/05/09到期 + CMGL:5565,“REC READ”,“+ 923466666666”
请告诉我从这个文件中提取数字+923466666666的方法,这样我就可以把它们放到另一个文件或文本框中。
由于
答案 0 :(得分:2)
如果数字全部位于行的末尾,则可以使用如下代码
foreach ( string line in File.ReadAllLines(@"c:\path\to\file.txt") ) {
Match result = Regex.Match(line, @"\+(\d+)""$");
if ( result.Success ) {
var number = result.Groups[1].Value;
// do what you want with the number
}
}
答案 1 :(得分:2)
这是使用String.Split的示例。 “数字”包含一个“+”,所以它应该被视为字符串而不是数字。我假设这是一个可能用于国际电话的“+”电话号码?如果是电话号码,您需要注意短划线,数字中的空格以及添加到末尾的分机号码,例如“+9234 666-66666 ext 235”等等......
无论如何 - 希望这个例子对于掌握Split很有用。
代码包括使用NUnit v2.4.8的单元测试
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using NUnit.Framework;
using System.Text.RegularExpressions;
namespace SO.NumberExtractor.Test
{
public class NumberExtracter
{
public List<string> ExtractNumbers(string lines)
{
List<string> numbers = new List<string>();
string[] seperator = { System.Environment.NewLine };
string[] seperatedLines = lines.Split(seperator, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in seperatedLines)
{
string s = ExtractNumber(line);
numbers.Add(s);
}
return numbers;
}
public string ExtractNumber(string line)
{
string s = line.Split(',').Last<string>().Trim('"');
return s;
}
public string ExtractNumberWithoutLinq(string line)
{
string[] fields = line.Split(',');
string s = fields[fields.Length - 1];
s = s.Trim('"');
return s;
}
}
[TestFixture]
public class NumberExtracterTest
{
private readonly string LINE1 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666666\"";
private readonly string LINE2 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666667\"";
private readonly string LINE3 = "AT+CMGL=\"ALL\" +CMGL: 5566,\"REC READ\",\"Ufone\" Dear customer, your DAY_BUCKET subscription will expire on 02/05/09 +CMGL: 5565,\"REC READ\",\"+923466666668\"";
[Test]
public void ExtractOneLineWithoutLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumberWithoutLinq(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractOneLineUsingLinq()
{
string expected = "+923466666666";
NumberExtracter c = new NumberExtracter();
string result = c.ExtractNumber(LINE1);
Assert.AreEqual(expected, result);
}
[Test]
public void ExtractMultipleLines()
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(LINE1);
sb.AppendLine(LINE2);
sb.AppendLine(LINE3);
NumberExtracter ne = new NumberExtracter();
List<string> extractedNumbers = ne.ExtractNumbers(sb.ToString());
string expectedFirst = "+923466666666";
string expectedSecond = "+923466666667";
string expectedThird = "+923466666668";
Assert.AreEqual(expectedFirst, extractedNumbers[0]);
Assert.AreEqual(expectedSecond, extractedNumbers[1]);
Assert.AreEqual(expectedThird, extractedNumbers[2]);
}
}
}
答案 2 :(得分:1)
文件有多大?如果文件大小只有几兆字节,我建议将文件内容加载到字符串中,并使用编译的正则表达式来提取匹配项。
这是一个简单的例子:
Regex NumberExtractor = new Regex("[0-9]{7,16}",RegexOptions.Compiled);
/// <summary>
/// Extracts numbers between seven and sixteen digits long from the target file.
/// Example number to be extracted: +923466666666
/// </summary>
/// <param name="TargetFilePath"></param>
/// <returns>List of the matching numbers</returns>
private IEnumerable<ulong> ExtractLongNumbersFromFile(string TargetFilePath)
{
if (String.IsNullOrEmpty(TargetFilePath))
throw new ArgumentException("TargetFilePath is null or empty.", "TargetFilePath");
if (File.Exists(TargetFilePath) == false)
throw new Exception("Target file does not exist!");
FileStream TargetFileStream = null;
StreamReader TargetFileStreamReader = null;
string FileContents = "";
List<ulong> ReturnList = new List<ulong>();
try
{
TargetFileStream = new FileStream(TargetFilePath, FileMode.Open);
TargetFileStreamReader = new StreamReader(TargetFileStream);
FileContents = TargetFileStreamReader.ReadToEnd();
MatchCollection Matches = NumberExtractor.Matches(FileContents);
foreach (Match CurrentMatch in Matches) {
ReturnList.Add(System.Convert.ToUInt64(CurrentMatch.Value));
}
}
catch (Exception ex)
{
//Your logging, etc...
}
finally
{
if (TargetFileStream != null) {
TargetFileStream.Close();
TargetFileStream.Dispose();
}
if (TargetFileStreamReader != null)
{
TargetFileStreamReader.Dispose();
}
}
return (IEnumerable<ulong>)ReturnList;
}
样本用法:
List<ulong> Numbers = (List<ulong>)ExtractLongNumbersFromFile(@"v:\TestExtract.txt");