我在数据表中有一个URLS列表。我想删除以相同域开头的行。现在我有这个代码:
List<int> toRemove = new List<int>();
toRemove.Clear();
string initialDomain;
string compareDomainName;
for(int i = 0; i<UrlList.Rows.Count -1; i++)
{
if (toRemove.Contains(i))
continue;
initialDomain = new Uri(UrlList.Rows[i][0] as String).Host;
for(int j = i + 1; j < UrlList.Rows.Count; j++)
{
compareDomainName = new Uri(UrlList.Rows[j][0] as String).Host;
if (String.Compare(initialDomain, compareDomainName, true) == 0)
{
toRemove.Add(j);
}
}
percent = i * 100 / total;
if (percent > lastPercent)
{
progress.EditValue = percent;
Application.DoEvents();
lastPercent = percent;
}
}
for(int i = toRemove.Count-1; i>=0; i--)
{
UrlList.Rows.RemoveAt(toRemove[i]);
}
它适用于少量数据,但是当我加载一长串URL时,它非常慢。现在我想转移到linq,但我不知道如何使用linq实现这一点。有什么帮助吗?
更新 * 我不需要删除eduplicate行。对于前者 我有一个URL列表 现在,我知道如何删除重复的行。我的问题是: 我有一个简单的网址列表:
http://centroid.steven.centricagency.com/forms/contact-us?page=1544
http://chirp.wildcenter.org/poll
http://itdiscover.com/links/
http://itdiscover.com/links/?page=132
http://itdiscover.com/links/?page=2
http://itdiscover.com/links/?page=3
http://itdiscover.com/links/?page=4
http://itdiscover.com/links/?page=6
http://itdiscover.com/links/?page=8
http://www.foreignpolicy.com/articles/2010/06/21/la_vie_en
http://www.foreignpolicy.com/articles/2010/06/21/the_worst_of_the_worst
http://www.foreignpolicy.com/articles/2011/04/25/think_again_dictators
http://www.foreignpolicy.com/articles/2011/08/22/the_dictators_survival_guide
http://www.gsioutdoors.com/activities/pdp/glacier_ss_nesting_wine_glass/gourmet_backpacking/
http://www.gsioutdoors.com/products/pdp/telescoping_foon_orange/
http://www.gsioutdoors.com/products/pdp/telescoping_spoon_blue/
现在我想要这个清单:
http://centroid.steven.centricagency.com/forms/contact-us?page=1544
http://chirp.wildcenter.org/poll
http://itdiscover.com/links/
http://www.foreignpolicy.com/articles/2010/06/21/la_vie_en
http://www.gsioutdoors.com/activities/pdp/glacier_ss_nesting_wine_glass/gourmet_backpacking/
答案 0 :(得分:2)
var result = urls.Distinct(new UrlComparer());
public class UrlComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return new Uri(x).Host == new Uri(y).Host;
}
public int GetHashCode(string obj)
{
return new Uri(obj).Host.GetHashCode();
}
}
您还可以实施扩展方法DistinctBy
public static partial class MyExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
return source.Where(x => knownKeys.Add(keySelector(x)));
}
}
var result = urls.DistinctBy(url => new Uri(url).Host);
答案 1 :(得分:0)
尝试使用它:
IEnumerable<string> DeleteDuplicates(IEnumerable<string> source)
{
var hosts = new HashSet<string>();
foreach (var s in source)
{
var host = new Uri(s).Host.ToLower();
if (hosts.Contains(host))
continue;
hosts.Add(host);
yield return s;
}
}
答案 2 :(得分:-1)
您好实现此功能以删除重复的行
public DataTable FilterURLS(DataTable urllist)
{
return
(from urlrow in urllist.Rows.OfType<DataRow>()
group urlrow by urlrow.Field<string>("Host") into g
select g
.OrderBy(r => r.Field<int>("ID"))
.First()).CopyToDataTable();
}