如何遍历格式错误的CSV文件

时间:2019-01-12 22:25:39

标签: c# csv .net-core

我有一个要导入到数据库中的CSV文件,但是格式不正确。对我来说,问题是我可以轻松地遍历字段名,但是当我到达包含数据的行时,即以数字开头,我会感到困惑,因为列从两列扩展为五列,并以数字表示(每个包含数据的文件最多可能有48,000行)。

Some Field Name行是元数据,那么当它到达行中第一个单元格是数字时,这是实际数据。

无论行中有多少列,我都使用以下代码填充所有行的列表。

var reader = new StreamReader(File.OpenRead(fileLocation));
List<string> listRows = new List<string>();
while (!reader.EndOfStream)
{
   listRows.Add(reader.ReadLine());
}

我可以处理Some Field Name列,因为它们是固定名称,我可以通过拆分字符串来做到这一点,这样我就可以得到我的值。

我要努力做的是,当第一列更改为数字时,如何测试我不再读取Some Field Name,而是现在读取一行数据,该数据由该行中的第一个单元格从Some Field Name更改为数字(整数)。

Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
Some Field Name ,   Some Field Value    ,               ,           ,   
1               ,   04/12/2018          ,   11:46:23    ,   0:00:00 ,   9
2               ,   04/12/2018          ,   11:48:23    ,   0:02    ,   9
3               ,   04/12/2018          ,   11:50:23    ,   0:04:00 ,   9
4               ,   04/12/2018          ,   11:52:23    ,   0:06    ,   9
5               ,   04/12/2018          ,   11:54:23    ,   0:08:00 ,   9
6               ,   04/12/2018          ,   11:56:23    ,   0:10    ,   9
7               ,   04/12/2018          ,   11:58:23    ,   0:12:00 ,   9
8               ,   04/12/2018          ,   12:00:23    ,   0:14    ,   9
9               ,   04/12/2018          ,   12:02:23    ,   0:16:00 ,   9
10              ,   04/12/2018          ,   12:04:23    ,   0:18    ,   9
11              ,   04/12/2018          ,   12:06:23    ,   0:20:00 ,   9
12              ,   04/12/2018          ,   12:08:23    ,   0:22    ,   9
13              ,   04/12/2018          ,   12:10:23    ,   0:24:00 ,   9
14              ,   04/12/2018          ,   12:12:23    ,   0:26    ,   9

TIA

1 个答案:

答案 0 :(得分:1)

尝试以下操作仍可删除|而不是更新的逗号:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Data;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.txt";
        static void Main(string[] args)
        {
            FixedColumnWidth fixColumnWidth = new FixedColumnWidth();
            DataTable dt =  fixColumnWidth.ReadFile(FILENAME);
        }

    }
    public class FixedColumnWidth
    {
        public DataTable ReadFile(string filename)
        {
            string line = "";
            string pattern = @"^\d+$";

            StreamReader reader = new StreamReader(filename);
            DataTable dt = new DataTable();
            dt.Columns.Add("Index", typeof(int));
            dt.Columns.Add("Date", typeof(DateTime));
            dt.Columns.Add("Amount", typeof(string));
            dt.Columns.Add("Value", typeof(int));



            while ((line = reader.ReadLine()) != null)
            {
                if (line.Trim().Length > 0)
                {
                    List<string> row = GetData(line);
                    Match match = Regex.Match(row[0].Trim(), pattern);
                    if (match.Success)
                    {
                        dt.Rows.Add(new object[] {
                            int.Parse(row[0]),
                            DateTime.Parse(row[1] + " " + row[2]),
                            row[3],
                            int.Parse(row[4])
                        }); 

                    }
                }
            }
            return dt;
        }
        private List<string> GetData(string line)
        {
            int[] START_COLUMNS = { 0, 17, 41, 57, 69 };
            List<string> array = new List<string>();

            for (int startCol = 0; startCol < START_COLUMNS.Count(); startCol++)
            {
                if (startCol == START_COLUMNS.Count() - 1)
                {

                    array.Add(line.Substring(START_COLUMNS[startCol]).Trim());
                }
                else
                {

                    array.Add(line.Substring(START_COLUMNS[startCol], START_COLUMNS[startCol + 1] - START_COLUMNS[startCol]).Trim(new char[] { ',', ' '}));
                }

            }
            return array;
        }
    }
}