Question

我有一个包含损坏网址的文本文件，如

 for(const el of A){
  for(const el2 of B){
    if(el.value === el2.value){
      alert("found it");
      break;//O(log n) instead of O(n)
    }
  }
}

和

 http images5fanpopcomimagephotos29000000ichigowallpaperkurosakiichigo290694271024768jpg

。我想删除http或https后面的这些长字符串。

有人可以建议解决方案吗？

Answer 1

您可以搜索每一行的http或https，然后搜索该行是否超过X（例如40）个字符并且没有“/”和/或“。”在其中，删除。

System.IO.StringReader strReader = new System.IO.StringReader(input);
string line;
string output;
while ((line = strReader.ReadLine()) != null)
{
  if(line.IndexOf("http") == 0)
  {
    if( (line.Length >40) && ((line.Contains('.') == false) || (line.Contains('/') == false)) )
      {
        add = false;
      } else {
        add = true;
      }
    } else {
      add = true
    }
  if(add)   output += line + "\r\n";
}

删除不完整的网址

1 个答案: