Question

我正在从stream读取5000行数据，如下所示，并将其存储在新的CSV文件中。

ProductCode |Name   | Type  | Price
ABC | Shoe  | Trainers  | 3.99
ABC | Shoe  | Trainers  | 4.99
ABC | Shoe  | Trainers  | 5.99 
ABC | Shoe  | Heels | 3.99
ABC | Shoe  | Heels | 4.99
ABC | Shoe  | Heels | 5.99
...

我希望CSV有一行但价格总和为

，而不是重复条目

ProductCode |Name   | Type  | Price
ABC | Shoe  | Trainers  | 14.97
ABC | Shoe  | Heels | 14.97

我将每一行存储为Product：

public class Product
    {
        public string ProductCode { get; set; }
        public string Name { get; set; }
        public string Type { get; set; }
        public string Price { get; set; }
    }

从流中读取数据后，我最终得到IEnumerable<Product>。

我的代码是：

string fileName = Path.Combine(directory, string.Format("{0}.csv", name));            
var results = Parse(stream).ToList(); //Parse returns IEnumerable<Product>
if (results.Any())
            {
                using (var streamWriter = File.CreateText(fileName))
                {
                    //writes the header line out
                    streamWriter.WriteLine("{0},{1}", header, name);

                    results.ForEach(p => { streamWriter.WriteLine(_parser.ConvertToOutputFormat(p)); });
                    streamWriter.Flush();
                    streamWriter.Close();
                }

                Optional<string> newFileName = Optional.Of(SharpZipWrapper.ZipFile(fileName, RepositoryDirectory));
                //cleanup
                File.Delete(fileName);
                return newFileName;
            }

我不想再次检查5000行以删除重复项，但是在将其添加到csv文件之前，我想检查该条目是否已存在。

最有效的方法是什么？

Answer 1

听起来你只需要一个合适的LINQ转换：

results = results
    .GroupBy(p => p.ProductCode)
    .Select(g => new Product {
        ProductCode = g.Key,
        Name = g.First().Name,
        Type = g.First().Type,
        Price = g.Sum(p => p.Price)
    })
    .ToList();

或者，由于某些奇怪的原因，ProductCode不是唯一ID：

results = results
    .GroupBy(p => new { p.ProductCode, p.Name, p.Type })
    .Select(g => new Product {
        ProductCode = g.Key.ProductCode,
        Name = g.Key.Name,
        Type = g.Key.Type,
        Price = g.Sum(p => p.Price)
    })
    .ToList();

这假设您已将Product类型更改为decimal属性的Price类型。价格不是文字，因此不应存储为字符串。

Answer 2

List<Product> results = new List<Product>(new Product[]{
    new Product() { ProductCode="ABC ", Name="Shoe", Type="Trainers", Price="3.99" },
    new Product() { ProductCode="ABC ", Name="Shoe", Type="Trainers", Price="4.99" },
    new Product() { ProductCode="ABC ", Name="Shoe", Type="Trainers", Price="5.99" },
    new Product() { ProductCode="ABC ", Name="Shoe", Type="Heels", Price="3.99" },
    new Product() { ProductCode="ABC ", Name="Shoe", Type="Heels", Price="4.99" },
    new Product() { ProductCode="ABC ", Name="Shoe", Type="Heels", Price="5.99" },
});

results = (from e in results
           group e by new { e.ProductCode, e.Name, e.Type } into g
           select new Product
           {
               ProductCode = g.Key.ProductCode,
               Name = g.Key.Name,
               Type = g.Key.Type,
               Price = g.Sum(p => double.Parse(p.Price, CultureInfo.InvariantCulture)).ToString("0.00", CultureInfo.InvariantCulture)
           }).ToList();

Answer 3

您可以创建一个带有字典的类，其中包含产品代码和产品代码的值。

此外，您可以逐行读取流，尝试将新的键/值对添加到字典中。但在添加值之前，请检查它是否包含密钥（产品代码），如果是，则获取该密钥的Product对象并更新价格。

然后迭代字典并写入csv。这样，在编写CSV之前，您无需阅读两次以查找重复项。

Answer 4

我不想再次浏览5000行以删除重复内容，但想在将其添加到csv文件之前检查该条目是否已存在。

要实现此目的，您可以覆盖Equals()对象上的Product，然后在添加两次之前检查列表中是否存在Product，然后总结Price。
在这里你可以找到一些指导，同时重写Equals（）：
Guidelines for Overloading Equals() and Operator == (C# Programming Guide)

在读取数据时对重复值求和

4 个答案: