我正在尝试用C#解析
+-------------+-----------------------------------------------------------------------------------+----------------+
| 1 | 2 | 3 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 000 | Собственные средства (капитал), итого, | |
| | в том числе: | 1024231079 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100 |Источники базового капитала: | 1291298211 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100.1 |Уставный капитал кредитной организации: | 651033884 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100.1.1 |сформированный обыкновенными акциями | 129605413 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100.1.2 |сформированный привилегированными акциями | 521428471 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100.1.3 |сформированный долями | 0 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100.2 |Эмиссионный доход: | 439401101 |
+-------------+-----------------------------------------------------------------------------------+----------------+
| 100.2.1 |кредитной организации в организационно-правовой форме акционерного общества, всего,| |
| | в том числе: | 439401101 |
+-------------+-----------------------------------------------------------------------------------+----------------+
我的代码是
string[] dels = { "\r\n" };
string[] strArr = someStr.Split(dels, StringSplitOptions.None);
Console.WriteLine(strArr);
foreach (String sourcestring in strArr)
{
if (sourcestring != null)
{
Console.WriteLine("Processing string: ");
Console.WriteLine(sourcestring);
//Regex regex = new Regex(@"^(\|)(.*)(\|)(.*[а-я]{3}.*)(\|)(.*\d+.*)(\|)(.*[\d+|Х].*)(\|)(.*[\d+|Х].*)(\|)(.*\d+.*)(\|)$");
//Regex regex = new Regex(@"^(\|)(\s?|\d+[\.?])(\|)(.*[а-я]{3}.*)(\|)(.*\d+.*)(\|)(.*[\d+|Х].*)(\|)(.*[\d+|Х].*)(\|)(.*\d+.*)(\|)$");
Regex regex = new Regex(@"^(\|)(\d+\.?\d+)");
MatchCollection mc = regex.Matches(sourcestring);
int mIdx = 0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, regex.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
Console.WriteLine("---------------------------------------------------------");
}
}
我需要提取行的值
4 - ' 000 ', ' Собственные средства (капитал), итого, ', ' '
5 - ' ', ' в том числе: ', ' 1024231079 '
和第7,9行......
现在的主要问题是,我不知道如何在第一列值中找到reg exp,可能是:
' 000 '
' '
' 100 '
' 100.1 '
' 100.1.1 '
等等。
第二个问题在第二栏。我尝试使用(.*[а-я]{3}.*)
解析它,但它在包含'('
,','
,'.'
,{{1}等符号的行上失败了}}
我会感谢所有可能的解决方案。
答案 0 :(得分:1)
我认为RegEx would be overkill in this case,一种简单的手动解析方法会更容易:
有些人在面对问题时会想到,我知道,我会使用正则表达式。&#34;现在他们有两个问题。
在这种情况下可能有两种方法:
+---+--- ...
)以确定每列的长度,并通过将其与Substring
分开来解析数据。|
分割每列。下面,我概述了第二种方法的基础知识(没有健全性检查)
如果您的数据也可以包含|
,则可能需要根据单元格大小解析数据,而不是按其拆分。
// Row is defined below - simple data storage for three the columns
List<Row> rows = new List<Row>();
Row currentRow = null;
// Process each line
foreach (string line in input.Split(new string[] {"\r\n"}, StringSplitOptions.RemoveEmptyEntries))
{
// Row separator or content?
if (line.StartsWith("+"))
{
if (currentRow != null)
{
rows.Add(currentRow);
currentRow = null;
}
}
else if (line.StartsWith("|"))
{
string[] parts = line.Split(new char[] {'|'});
if(currentRow == null)
currentRow = new Row();
// Might need additional processing
currentRow.Column1 += parts[1].Trim();
currentRow.Column2 += parts[2].TrimEnd();
currentRow.Column3 += parts[3].TrimStart();
}
else
{
//Invalid data?
}
}
// Show result
foreach(Row row in rows)
{
Console.WriteLine("[{0}][{1}] = {2}", row.Column1, row.Column2, row.Column3);
}
您可以使用Tuple<string,string,string>
或任何适合您数据类型的内容,而不是自定义类。
public class Row
{
public string Column1 = "";
public string Column2 = "";
public string Column3 = "";
}