如何将CSV文件拆分为多个部分

时间:2019-10-29 04:43:11

标签: c# csv split memorystream

如何分割从URL下载的csv文件?我正在尝试保留拆分的标题。

例如

A,B,C,D,E
1,2,3,4,5
12,11,8,7,6
23,23,34,1,0
23,23,32,1,0

转换为

A,B,C,D,E
1,2,3,4,5
12,11,8,7,6

A,B,C,D,E
23,23,34,1,0
23,23,32,1,0

我下面的代码检索URL文件:

MemoryStream file = GetStreamFromUrl(invoiceAPI);

private static MemoryStream GetStreamFromUrl(string url)
{
    MemoryStream stream = new MemoryStream();

    WebClient wc = new WebClient();
    using (MemoryStream streamDownload = new MemoryStream(wc.DownloadData(url)))
    {
       stream = streamDownload;
    }

    return stream;
}

我如何能够拆分csv文件并保留标题,并且文件具有动态长度,例如,我可以将其拆分为仅10行,因为我将上传它用于另一组。你能告诉我如何解释吗。

4 个答案:

答案 0 :(得分:1)

用户字符串。分割,将第一行作为其标题,然后分割其余各行。 https://docs.microsoft.com/en-us/dotnet/api/system.string.split?view=netframework-4.8

答案 1 :(得分:0)

我想出了两个版本。

公共部分

var dataLinesPerFile = 2;

var contentAsLines = content.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);

var header = contentAsLines[0];
var dataLines = contentAsLines.Skip(1);

A。一次写入整个文件的版本

// I've used foreach so that the algorithm could be used if reading line by line rather then the whole file 
List<string> lines = new List<string>();
var fileId = 0;
foreach (var line in dataLines)
{
    lines.Add(line);
    if (lines.Count() % dataLinesPerFile == 0)
    {
        WriteChunk(fileId++, header, lines);
        lines = new List<string>(); // or lines.Clear();
    }
}
if (lines.Any()) WriteChunk(fileId++, header, lines);

(...)

private static void WriteChunk(int id, string header, IEnumerable<string> lines)
{
    Console.WriteLine("");
    Console.WriteLine($"File_A{id}:");
    Console.WriteLine(header);
    Console.WriteLine(string.Join(Environment.NewLine, lines)); // File.WriteAllLines
}

B。逐行写入的版本

var fileId = 0;
var lineCount = 0;
foreach (var line in dataLines)
{
    if (lineCount % dataLinesPerFile == 0)
    {
        //Close the file, create the new file and write the header
        Console.WriteLine(""); 
        Console.WriteLine($"File_B{fileId++}");
        Console.WriteLine(header);
    }
    Console.WriteLine(line);
    lineCount++;
}
// Close the current file

测试

输入

我添加了第五行以证明代码不会丢失“杂散”行。

var content = @"A,B,C,D,E
1,2,3,4,5
12,11,8,7,6
23,23,34,1,0
23,23,32,1,0
5,5,5,5,5";

输出

// .NETCoreApp,Version=v3.0

File_A0:
A,B,C,D,E
1,2,3,4,5
12,11,8,7,6

File_A1:
A,B,C,D,E
23,23,34,1,0
23,23,32,1,0

File_A2:
A,B,C,D,E
5,5,5,5,5
------------------

File_B0
A,B,C,D,E
1,2,3,4,5
12,11,8,7,6

File_B1
A,B,C,D,E
23,23,34,1,0
23,23,32,1,0

File_B2
A,B,C,D,E
5,5,5,5,5

答案 2 :(得分:0)

这是使用CsvHelper NuGet程序包的实现。

首先创建一个Row类以映射您的CSV列:

public class Row { 
    public int A { get; set; }
    public int B { get; set; }
    public int C { get; set; }
    public int D { get; set; }
    public int E { get; set; }
    public override string ToString()
    {
        return $"A={A},B={B},C={C},D,={D},E={E}";
    }
}

然后,您可以创建一个方法,该方法采用您要读取的CSV文件的源路径,并输出应该将新CSV文件保存到的位置的输出路径。您还需要指定要分块到每个文件中的行数。在这种情况下是两个。该方法肯定可以改进并可以处理错误检查,但是它显示了总体思路。

private static void SplitCsv(string source, string dest, int numRows)
{
    // Open CSV file for reading
    using (var fileReader = File.OpenText(source))
    {
        using (var csv = new CsvReader(fileReader))
        {
            // Collect all rows
            var rows = csv
                .GetRecords<Row>()
                .ToList();

            // Iterate rows in chunks
            for (var row = 0; row < rows.Count() / numRows; row++)
            {

                // Extract chunks using LINQ
                var fileRows = rows
                    .Skip(row * numRows)
                    .Take(numRows);

                // Create output path
                var outputPath = Path.Combine(dest, $"file{row}");

                // Write chunk to file
                using (var writer = new StreamWriter(outputPath, 
                    false, 
                    System.Text.Encoding.UTF8))
                {
                    using (var csvFile = new CsvWriter(writer))
                    {
                        csvFile.WriteRecords(fileRows);
                    }
                }
            }
        }
    }
}

下面将生成以下文件:

file0.txt

A,B,C,D,E
1,2,3,4,5
12,11,8,7,6

file1.txt

A,B,C,D,E
23,23,34,1,0
23,23,32,1,0

答案 3 :(得分:0)

我可以建议编辑一下吗?

static void SplitCsv(string source, string dest, int numRows, string currency, ref List<string> outputPaths)
    {
        // Apro il file CSV per la lettura
        using (TextReader fileReader = System.IO.File.OpenText(source))
        {
            using (CsvReader csv = new CsvReader(fileReader, CultureInfo.InvariantCulture))
            {

                csv.Configuration.HasHeaderRecord = false;

                // Raccolgo tutte le righe
                List<Row> rows = csv.GetRecords<Row>().ToList();

                // Itero le righe in blocchi
                for (int row = 0; row < rows.Count() / numRows; row++)
                {

                    // Estraggo i blocchi usando LINQ
                    var fileRows = rows
                        .Skip(row * numRows)
                        .Take(numRows);

                    // Creo un percorso di output

                    string outputPath = Path.Combine(dest, currency + "_" + DateTime.UtcNow.Year + "_" + DateTime.UtcNow.Month + "_" + DateTime.UtcNow.Day + $"_CashBacks{row}.csv");

                    // Scrivo i blocchi su file
                    using (TextWriter writer = new StreamWriter(outputPath, false, Encoding.UTF8))
                    {

                        using (CsvWriter csvFile = new CsvWriter(writer, CultureInfo.InvariantCulture))
                        {
                            csvFile.Configuration.HasHeaderRecord = false;

                            csvFile.WriteRecords(fileRows);
                        }
                    }

                    outputPaths.Add(outputPath);

                }
            }
        }
    }