C#解析&比较巨大的列表/字符串

时间:2016-06-17 15:34:20

标签: c# string list parsing compare

我有2个巨型名单(每个超过2000个)

我想解析&比较它们。

列表如下:

zone "exampledomain.com" {
zone "exampledomain2.com" {
zone "exampledomain3.com" {
zone "exampledomain4.com" {
zone "exampledomain5.com" {
zone "exampledomain6.com" {
zone "exampledomain7.com" {

另一个列表是什么样的:

zone "exampledomain.com" {
zone "exampledomain3.com" {
zone "exampledomain5.com" {
zone "exampledomain7.com" {

两个列表都具有相同格式的区域“____”{ 我想解析,以便我可以比较域,然后得到域的差异所以我知道另一个缺少什么,他们应该都有相同的结果。

我遇到过这段代码:

  static void Main(string[] args)
{
    string s1 = "i have a car a car";
    string s2 = "i have a new car bmw";

    List<string> diff;
    IEnumerable<string> set1 = s1.Split(' ').Distinct();
    IEnumerable<string> set2 = s2.Split(' ').Distinct();

    if (set2.Count() > set1.Count())
    {
        diff = set2.Except(set1).ToList();
    }
    else
    {
        diff = set1.Except(set2).ToList();
    }
}

但我想知道考虑到每个列表中有超过2000行,最好的方法是什么。

2 个答案:

答案 0 :(得分:0)

您提供的示例仅显示列表1,其中列表2中的项目已删除。如果您还想要列表2中不在列表1中的内容,则必须进行两次查询

for(auto& entry : rangeCounts) {
    if(y >= entry.first.first && y =< entry.first.second)
        ++entry.second;
}

我不确定在执行Except时涉及哪些代码,但是如果您希望看到如何生成包含差异的两个列表的实现,那么这里是一个解决方案:

var difference1 = list1.Except(list2);
var difference2 = list2.Except(list1);

我不知道LINQ怎么可能更快地做到这一点,但我的例程将处理重复条目,例如值&#34; 1&#34;在下面的例子中,LINQ赢了。因此,在选择使用哪种而不仅仅是速度差异时请记住这一点。

static void Differerence(
  IEnumerable<string> source1, IEnumerable<string> source2, 
  out List<string> difference1, out List<string> difference2)
{
    //Move the data from the sources into ordered queues
    var sourceValues1 = new Queue<string>(source1.OrderBy(x => x));
    var sourceValues2 = new Queue<string>(source2.OrderBy(x => x));

    difference1 = new List<string>();
    difference2 = new List<string>();

    while(sourceValues1.Count > 0 && sourceValues2.Count > 0)
    {
        string value1 = sourceValues1.Peek();
        string value2 = sourceValues2.Peek();
        switch (string.Compare(value1, value2))
        {
            //If they match then don't add difference to either list
            case 0:
                sourceValues1.Dequeue();
                sourceValues2.Dequeue();
                break;

            //The left queue has the lowest value, record that and move on
            case -1:
                difference1.Add(value1);
                sourceValues1.Dequeue();
                break;

            //The right queue has the lowest value, record that and move on
            case 1:
                difference2.Add(value2);
                sourceValues2.Dequeue();
                break;

        }
    }
    //At least one of the queues is empty, so everything left in the other queue
    difference1.AddRange(sourceValues1);
    difference2.AddRange(sourceValues2);
}

如果您需要

,将两个结果合并为一个很容易
static void Main(string[] args)
{
    var list1 = new string[] { "1", "1", "3", "5", "7", "9" };
    var list2 = new string[] { "1", "2", "4", "6", "9", "10" };

    var difference1 = list1.Except(list2);
    var difference2 = list2.Except(list1);

    List<string> differenceX1;
    List<string> differenceX2;

    Differerence(list1, list2, out differenceX1, out differenceX2);
}

答案 1 :(得分:-1)

HashSets用于唯一元素列表:

https://msdn.microsoft.com/en-us/library/bb359438(v=vs.110).aspx

SELECT DISTINCT
    hi.SKUNo [HostSKU] ,
    SUBSTRING(vi.GTIN, 3, 14) [GTIN] ,
    CASE vg.VendorGroup
      WHEN vg.VendorGroup THEN vg.VendorGroup
      ELSE v.VendorNo
    END [VendorNo] ,
    'Inv_Full_Sync' [Reason Code] ,
    'Vendor Inventory Full Sync' [Reason Text] ,
    CASE vi.EncodeData
      WHEN 'Y' THEN ii.Quantity1
      ELSE 0
    END [Quantity] , --< quantity
    'ONHAND' [OnHand] ,
    RTRIM(v.Category) [Vendor Category]
FROM    ItemInventory ii
    INNER JOIN HostItems hi ON hi.ItemId = ii.ItemId
    INNER JOIN VendorItems vi ON vi.ItemId = ii.ItemId
                                 AND vi.VendorNo = ii.VendorNo
    INNER JOIN Vendors v ON v.VendorNo = ii.VendorNo
    LEFT JOIN dbo.VendorGroups vg ON vg.VendorNo = v.VendorNo
WHERE   QtyType = 0
    AND [Quantity] > 0 --< here is where it bombs..
ORDER BY VendorNo ,
    hi.SKUNo;