我有一个制表符分隔文件,其中包含重复的命名标题;
[Column1] \t [Column2] \t [test] \t [test] \t [test] \t [test] \t [Column3] \t [Column4]
我想要做的是,用整数重命名复制[test]的列。 所以会变得像
[Column1] \t [Column2] \t [test1] \t [test2] \t [test3] \t [test4] \t [Column3] \t [Column4]
到目前为止,我可以隔离第一行。然后计算我找到的匹配
string destinationUnformmatedFileName = @"C:\New\20130816_Opportunities_unFormatted.txt";
string destinationFormattedFileName = @"C:\New\20130816_Opportunities_Formatted.txt";
var unformattedFileStream = File.Open(destinationUnformmatedFileName, FileMode.Open, FileAccess.Read); // Open (unformatted) file for reading
var formattedFileStream = File.Open(destinationFormattedFileName, FileMode.Create, FileAccess.Write); // Create (formattedFile) for writing
StreamReader sr = new StreamReader(unformattedFileStream);
StreamWriter sw = new StreamWriter(formattedFileStream);
int rowCounter = 0;
// Read each row in the unformatted file
while ((currentRow = sr.ReadLine()) != null)
{
//First row, lets check for duplicate names
if (rowCounter = 0)
{
// Write column name to array
string delimiter = "\t";
string[] fieldNames = currentRow.Split(delimiter.ToCharArray());
foreach (string fieldName in fieldNames)
{
// fieldName must be followed by a tab for it to be a duplicate
// original code - causing the issue
//Regex rgx = new Regex("\\t(" + fieldName + ")\\t");
// Edit - resolved the issue
Regex rgx = new Regex("(?<=\\t|^)(" + fieldName + ")(\\t)+");
// Count how many occurances of fieldName in currentRow
int count = rgx.Matches(currentRow).Count;
//MessageBox.Show("Match Count = " + count.ToString());
// If we have a duplicate field name
if (count > 1)
{
string newFieldName = "\t" + fieldName + count.ToString() + "\t";
//MessageBox.Show(newFieldName);
currentRow = rgx.Replace(currentRow, newFieldName, 1);
}
}
}
rowCounter++;
}
我认为我走在正确的轨道上,但我不认为正则表达式工作正常吗?
编辑:我想我已经找到了如何使用;
找到模式Regex rgx = new Regex("(?<=\\t|^)(" + fieldName + ")(\\t)+");
它不是一个交易破坏者,但现在唯一的问题是它标签;
[Column1] \t [Column2] \t [test4] \t [test3] \t [test2] \t [test] \t [Column3] \t [Column4]
而不是
[Column1] \t [Column2] \t [test1] \t [test2] \t [test3] \t [test4] \t [Column3] \t [Column4]
答案 0 :(得分:0)
首先在RegExr测试你的正则表达式。我认为“\ t”是一个特殊的角色。试试“\\ t”。 在你的C#中它将是“\\\\ t”
答案 1 :(得分:0)
使用以下
Regex rgx = new Regex("(?<=\\t|^)(" + fieldName + ")(\\t)+");
使用我在此处找到的环视解决了这个问题; http://www.regular-expressions.info/duplicatelines.html
可能应该花几分钟时间研究它。
答案 2 :(得分:0)
以下是Regex
和LINQ
之间的精彩组合:
var input = @"[Column1] \t [Column2] \t [test] \t [test] \t [test] \t [foo] \t [test] \t [Column3] \t [foo] \t [Column4]";
Regex reg = new Regex(@"(?<=\\t )[[](.+?)[]]");
string output = "";
int k = 0;
foreach (var m in reg.Matches(input)
.OfType<Match>()
.Select((x,i)=>new {x,i})
.GroupBy(g=>g.x.Value)
.Where(g=>g.Count()>1)
.SelectMany(x=> x.Select((a,i)=>new {a,i=i+1}))
.OrderBy(x=>x.a.i)){
output += input.Substring(k, m.a.x.Index - k) + m.a.x.Result("[${1}" + m.i + "]");
k = m.a.x.Index + m.a.x.Length;
}
output += input.Substring(k);
结果: [Column1] \ t [Column2] \ t [test1] \ t [test2] \ t [test3] \ t [foo1] \ t [test4] \ t [Column3] \ t [foo2] \ t [Column4 ] 强>