Question

 IEnumerable<IEnumerable<object>> data = Enumerable.Repeat(new List<object> {10, "Twenty", 30, 40, DateTime.UtcNow }, int.MaxValue);

让我们假设这是我的数据，它表示行和列，数据的大小是Int.MaxValue（是非常非常大）。

作为一个例子，它的外观如下：

10, "Twenty", 30, 40, 21/08/2016 00:00:00
10, "Twenty", 30, 40, 21/08/2016 00:00:00
10, "Twenty", 30, 40, 21/08/2016 00:00:00
10, "Twenty", 30, 40, 21/08/2016 00:00:00
10, "Twenty", 30, 40, 21/08/2016 00:00:00
..repeated int.maxvalue times.

好的，我们可以看到有5列和int.maxvalue行。

我想要做的是为每列应用逻辑。所以，让我们说：

第1列逻辑=＆＃34; NoRepeat＆＃34; （表示此列中只应有1个值）

第2列逻辑=＆＃34;唯一＆＃34; （表示此列中应该只有唯一值）

第3列逻辑=＆＃34;禁用＆＃34; （表示此栏中不应有任何价值）

第4列和第5列逻辑=＆＃34;默认＆＃34; （意思是保持原样）。

最后这应该是预期的结果：

10, "Twenty", "", 40, 21/08/2016 00:00:00
"", "",       "", 40, 21/08/2016 00:00:00
"", "",       "", 40, 21/08/2016 00:00:00
"", "",       "", 40, 21/08/2016 00:00:00
"", "",       "", 40, 21/08/2016 00:00:00

我设法使用LINQ做到了这一点，但在丢失内存异常之前，我可以操作1000万行。

有人可以帮我解决这个问题吗？

这是我目前的实施：

var group = data.SelectMany(x => x.Select((InnerValue, Index) => new { Index, InnerValue })).GroupBy(x => x.Index).Select(x =>
            {
                string logicType = logics.ElementAt(x.First().Index);

                switch (logicType)
                {
                    case "NoRepeat":
                        return x.Select((OuterValue, Index) => new { Index, OuterValue }).Select(y =>
                            {
                                return y.Index == 0 ? y.OuterValue.InnerValue : string.Empty;
                            });
                    case "Unique":
                        return x;
                    case "Disabled":
                        return x;
                    default:
                        return x;
                }
            }).SelectMany(x => x.Select((Value, Index) => new { Index, Value })).GroupBy(x => x.Index).Select(x => x.Select(y => y.Value));

Answer 1

请试试这个。

这个例子在我的机器上运行时只消耗几兆内存。

IEnumerable<IEnumerable<object>> data = Enumerable.Repeat(new List<object> { 10, "Twenty", 30, 40, DateTime.UtcNow }, int.MaxValue);

var hashSet = new HashSet<string>();

var result = data.Select((enumerable, index) =>
{
    var list = (List<object>)enumerable;

    if (index > 0) list[0] = null; // only 1 value

    // unique
    string str = (string)list[1];
    if (hashSet.Contains(str))
        list[1] = null;
    else
        hashSet.Add(str);

    list[2] = null; // disabled

    return list;
});

Console.WriteLine(result.Count());

＆＃34>只有1个值的逻辑＆＃34;并且＆＃34;禁用＆＃34;根本不消耗记忆。

实现＆＃34; unique＆＃34;的逻辑我使用了HashSet。它消耗的内存与唯一元素的数量成正比。如果独特元素的数量很少 - 一切都会正常工作。如果唯一元素的数量很多 - 没有什么可做的，内存消耗将很大。

LINQ大数据操作。内存不足异常

1 个答案: