我们说我有以下格式的数据......
<sdf<xml>....</xml>...
.........<smc<xml>....
...</xml>...<ueo<xml>.
.... and goes on......
我的目标是从文件中逐行读取此数据,然后在检测到任何<xml>
标记之前删除前面的4个字符。在上述情况下,将删除<sdf
,<smc
和<ueo
。
我现在已经写了以下内容..目前的正则表达式是错误的,无法正常工作..
while((line = reader.ReadLine()) !=null)
{
writer.WriteLine(Regex.Replace(line, @"(?i)</(xml)(?!>)",</$1>),, string.Empty);
}
答案 0 :(得分:2)
你的总体思路和循环结构很好。它只是正则表达式匹配,需要一点工作:
while ((line = reader.ReadLine()) != null)
writer.WriteLine(Regex.Replace(line, @"....<xml>", "<xml>"));
如果您希望使用<...<tag>
形式的任何模式,您可以使用:
while ((line = reader.ReadLine()) != null)
writer.WriteLine(Regex.Replace(line, @"<[^<>]{3}<([^<>]+)>", "<$1>"));
答案 1 :(得分:0)
你可以试试这个,
while((line = reader.ReadLine()) !=null)
{
writer.WriteLine(Regex.Replace(line, @"(?is).{4}(?=<(\w+)\b[^>]*>.*?</\1>)" ,""), string.Empty);
}
答案 2 :(得分:0)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication3
{
class Program
{
private static string testData = "<sdf<xml><something/></xml><smc<xml><something/><ueo<xml><something /></xml>";
static void Main(string[] args)
{
Func<string, string> stripInvalidXml = input => {
Func<int, bool> shouldSkip = index =>
{
var xI = index + 4; //add 4 to see what's after the current 4 characters
if (xI >= (input.Length - 5)) //make sure adding 4 and the length of <xml> doesn't exceed end of input
return false;
if (input.Substring(xI, 5) == "<xml>") //check if the characters 4 indexes after the current character are <xml>
return true; //skip the current index
return false; //don't skip
};
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.Length; ++i)
{
//loop through each character and see if the characters 4 after are <xml>
char c = input[i];
if (shouldSkip(i))
i += 3; //if should skip, we are already on the first character, so add 3 more to skip to skip 4 characters
else
sb.Append(c);
}
return sb.ToString();
};
Console.WriteLine(stripInvalidXml(testData));
Console.ReadKey(true);
}
}
}
答案 3 :(得分:0)
尝试:
writer.WriteLine(Regex.Replace(s, @"<.{3}(<\w*>)", "$1"), string.Empty);
这假设解决方案应该与那些没有明确命名为<xml></xml>
的标签匹配。