我正在尝试找到验证输入文档的最佳解决方案。我需要检查文档的每一行。基本上每行都可以存在无效的字符或字符。搜索(验证)的结果是:'获取带有无效字符的行索引和此行中每个无效字符的索引'。
我知道如何以标准方式做(打开文件 - >读取所有行 - >逐个检查字符),但这种方法不是最佳的优化方式。取而代之的是,最好的解决方案是使用“MatchCollection”(在我看来)。
但是如何在C#中正确地做到这一点?
链接:
示例:
“这里有一些Înput文本,\nÎs另一个文本的文本。”
在第一行[0]中[6]索引上找到无效字符,在第[1]行中 在[0,12,21]索引上找到无效字符。
using System;
using System.Text.RegularExpressions;
namespace RegularExpresion
{
class Program
{
private static Regex regex = null;
static void Main(string[] args)
{
string input_text = "Some Înput text here, Îs another lÎne of thÎs text.";
string line_pattern = "\n";
string invalid_character = "Î";
regex = new Regex(line_pattern);
/// Check is multiple or single line document
if (IsMultipleLine(input_text))
{
/// ---> How to do this correctly for each line ? <---
}
else
{
Console.WriteLine("Is a single line file");
regex = new Regex(invalid_character);
MatchCollection mc = regex.Matches(input_text);
Console.WriteLine($"How many matches: {mc.Count}");
foreach (Match match in mc)
Console.WriteLine($"Index: {match.Index}");
}
Console.ReadKey();
}
public static bool IsMultipleLine(string input) => regex.IsMatch(input);
}
}
输出:
答案 0 :(得分:0)
链接:强> http://www.dotnetperls.com/regexoptions-multiline
<强>解强>
using System;
using System.Text.RegularExpressions;
namespace RegularExpresion
{
class Program
{
private static Regex regex = null;
static void Main(string[] args)
{
string input_text = @"Some Înput text here,
Îs another lÎne of thÎs text.";
string line_pattern = "\n";
string invalid_character = "Î";
regex = new Regex(line_pattern);
/// Check is multiple or single line document
if (IsMultipleLine(input_text))
{
Console.WriteLine("Is a multiple line file");
MatchCollection matches = Regex.Matches(input_text, "^(.+)$", RegexOptions.Multiline);
int line = 0;
foreach (Match match in matches)
{
foreach (Capture capture in match.Captures)
{
line++;
Console.WriteLine($"Line: {line}");
RegexpLine(capture.Value, invalid_character);
}
}
}
else
{
Console.WriteLine("Is a single line file");
RegexpLine(input_text, invalid_character);
}
Pause();
}
public static bool IsMultipleLine(string input) => regex.IsMatch(input);
public static void RegexpLine(string line, string characters)
{
regex = new Regex(characters);
MatchCollection mc = regex.Matches(line);
Console.WriteLine($"How many matches: {mc.Count}");
foreach (Match match in mc)
Console.WriteLine($"Index: {match.Index}");
}
public static ConsoleKeyInfo Pause(string message = "please press ANY key to continue...")
{
Console.WriteLine(message);
return Console.ReadKey();
}
}
}
Thx伙伴们寻求帮助,如果有人比我聪明的话,基本上会很好,请检查这段代码的性能。
此致 Nerus。
答案 1 :(得分:0)
我的方法是将字符串拆分为字符串数组,每个字符串包含一行。如果数组的长度只有1,那意味着您只有1行。然后从那里使用正则表达式匹配每一行以找到您要查找的无效字符。
string input_text = "Some Înput text here,\nÎs another lÎne of thÎs text.";
string line_pattern = "\n";
// split the string into string arrays
string[] input_texts = input_text.Split(new string[] { line_pattern }, StringSplitOptions.RemoveEmptyEntries);
string invalid_character = "Î";
if (input_texts != null && input_texts.Length > 0)
{
if (input_texts.Length == 1)
{
Console.WriteLine("Is a single line file");
}
// loop every line
foreach (string oneline in input_texts)
{
Regex regex = new Regex(invalid_character);
MatchCollection mc = regex.Matches(oneline);
Console.WriteLine("How many matches: {0}", mc.Count);
foreach (Match match in mc)
{
Console.WriteLine("Index: {0}", match.Index);
}
}
}
---编辑---
需要考虑的事项: