查找具有最长字符串的对象

时间:2019-03-20 11:08:53

标签: c# linq

我正在从以下形式的外部服务获取对象列表:

[
    {
      'Sentence': 'C13 can travel by own car on road.',
      'Subject': 'C13',
      'Object': 'car',
      'Relation': 'CAN_TRAVEL_BY'
    },
    {
      'Sentence': 'C13 can travel by own car on road.',
      'Subject': 'C13',
      'Object': 'own car',
      'Relation': 'CAN_TRAVEL_BY'
    },
    {
      'Sentence': 'C13 can travel by own car on road.',
      'Subject': 'C13',
      'Object': 'road',
      'Relation': 'CAN_TRAVEL_ON'
    },
    {
      'Sentence': 'Kunal Mukherjee can travel by own car on road.',
      'Subject': 'Kunal',
      'Object': 'own car',
      'Relation': 'CAN_TRAVEL_BY'
    },
    {
      'Sentence': 'Kunal Mukherjee can travel by own car on road.',
      'Subject': 'Kunal Mukherjee',
      'Object': 'own car',
      'Relation': 'CAN_TRAVEL_BY'
    }
]

所以我的目的是从响应中过滤包含彼此的对象

{
  'Sentence': 'Kunal Mukherjee can travel by own car on road.',
  'Subject': 'Kunal',
  'Object': 'own car',
  'Relation': 'CAN_TRAVEL_BY'
},
{
  'Sentence': 'Kunal Mukherjee can travel by own car on road.',
  'Subject': 'Kunal Mukherjee',
  'Object': 'own car',
  'Relation': 'CAN_TRAVEL_BY'
}

在上述Subject属性中的对象中,最长的公共字符串为Kunal Mukherjee,因此仅需要过滤该对象。

另一个示例

{
  'Sentence': 'C13 can travel by own car on road.',
  'Subject': 'C13',
  'Object': 'car',
  'Relation': 'CAN_TRAVEL_BY'
},
{
  'Sentence': 'C13 can travel by own car on road.',
  'Subject': 'C13',
  'Object': 'own car',
  'Relation': 'CAN_TRAVEL_BY'
}

Object属性own car中,这是两者中最长的通用字符串,因此应采用。


因此,最终的过滤列表必须看起来像这样:

[
    {
      'Sentence': 'C13 can travel by own car on road.',
      'Subject': 'C13',
      'Object': 'own car',
      'Relation': 'CAN_TRAVEL_BY'
    },
    {
      'Sentence': 'C13 can travel by own car on road.',
      'Subject': 'C13',
      'Object': 'road',
      'Relation': 'CAN_TRAVEL_ON'
    },
    {
      'Sentence': 'Kunal Mukherjee can travel by own car on road.',
      'Subject': 'Kunal Mukherjee',
      'Object': 'own car',
      'Relation': 'CAN_TRAVEL_BY'
    }
]

因此,我试图这样比较每个ithi+1th元素的规则:

  • 如果ith元素的Subject包含i+1th元素Subject,请采取 反之亦然。
  • 如果ith元素的Object包含i+1th 元素Object接受,反之亦然。

但无法正确平移。

static void Main(string[] args)
{
    string data = @"[
                    {
                      'Sentence': 'C13 can travel by own car on road.',
                      'Subject': 'C13',
                      'Object': 'car',
                      'Relation': 'CAN_TRAVEL_BY'
                    },
                    {
                      'Sentence': 'C13 can travel by own car on road.',
                      'Subject': 'C13',
                      'Object': 'own car',
                      'Relation': 'CAN_TRAVEL_BY'
                    },
                    {
                      'Sentence': 'C13 can travel by own car on road.',
                      'Subject': 'C13',
                      'Object': 'road',
                      'Relation': 'CAN_TRAVEL_ON'
                    },
                    {
                      'Sentence': 'Kunal Mukherjee can travel by own car on road.',
                      'Subject': 'Kunal',
                      'Object': 'own car',
                      'Relation': 'CAN_TRAVEL_BY'
                    },
                    {
                      'Sentence': 'Kunal Mukherjee can travel by own car on road.',
                      'Subject': 'Kunal Mukherjee',
                      'Object': 'own car',
                      'Relation': 'CAN_TRAVEL_BY'
                    }
                  ]";

    List<JObject> js = JsonConvert.DeserializeObject<List<JObject>>(data);

    var pairs = js.Take(js.Count - 1).Select((x, i) =>
    {
        string aSubj = js[i]["Subject"].ToString();
        string bSubj = js[i + 1]["Subject"].ToString();


        string aObj = js[i]["Object"].ToString();
        string bObj = js[i + 1]["Object"].ToString();

        if ((aSubj.Length > bSubj.Length && aSubj.Contains(bSubj)) || (aObj.Length > bObj.Length && aObj.Contains(bObj)))
        {
            return js[i];
        }
        if ((aSubj.Length > bSubj.Length && aSubj.Contains(bSubj)) || (bObj.Length > aObj.Length && bObj.Contains(aObj)))
        {
            return js[i + 1];
        }

        return js[i];
    }).ToList();
}

这是.NET fiddle进行测试。

感谢您帮助我解决此问题。

1 个答案:

答案 0 :(得分:1)

您可以创建(扩展)方法来减少(过滤)您的商品:

public static IEnumerable<Item> Reduce(this IEnumerable<Item> items)
{
    using (var iterator = items.GetEnumerator())
    {
        if (!iterator.MoveNext())
            yield break;

        var previous = iterator.Current;

        while (iterator.MoveNext())
        {
            var next = iterator.Current;
            var containsPrevious =
                previous.Sentence == next.Sentence &&
                next.Subject.Contains(previous.Subject) &&
                next.Object.Contains(previous.Object);

            if (!containsPrevious)
                yield return previous;

            previous = next;
        }

        yield return previous;
    }
}

规则很简单-当相邻项目具有相同的句子,而后一项包含前一项的主题和宾语时,则从结果中丢弃第一项。

用法很简单:

var result = JsonConvert.DeserializeObject<List<Item>>(data).Reduce();

请注意,您需要Item类(考虑使用更好的名称)

public class Item
{
    public string Sentence { get; set; }
    public string Subject { get; set; }
    public string Object { get; set; }
    public string Relation { get; set; }
}

输出:

[
  {
    "Sentence": "C13 can travel by own car on road.",
    "Subject": "C13",
    "Object": "own car",
    "Relation": "CAN_TRAVEL_BY"
  },
  {
    "Sentence": "C13 can travel by own car on road.",
    "Subject": "C13",
    "Object": "road",
    "Relation": "CAN_TRAVEL_ON"
  },
  {
    "Sentence": "Kunal Mukherjee can travel by own car on road.",
    "Subject": "Kunal Mukherjee",
    "Object": "own car",
    "Relation": "CAN_TRAVEL_BY"
  }
]