Question

我正在编写一个验证某些城市的应用程序。部分验证是通过匹配国家/地区代码和城市名称（或alt cityname）来检查城市是否已经在列表中。

我将现有的城市列表存储为：

public struct City
{
    public int id;
    public string countrycode;
    public string name;
    public string altName;
    public int timezoneId;
}

List<City> cityCache = new List<City>();

然后我有一个包含国家/地区代码和城市名称等的位置字符串列表。我拆分了这个字符串，然后检查该城市是否已经存在。

string cityString = GetCity(); //get the city string
string countryCode = GetCountry(); //get the country string
city = new City();             //create a new city object
if (!string.IsNullOrEmpty(cityString)) //don't bother checking if no city was specified
{
    //check if city exists in the list in the same country 
    city = cityCache.FirstOrDefault(x => countryCode == x.countrycode && (Like(x.name, cityString ) || Like(x.altName, cityString )));
    //if no city if found, search for a single match accross any country
    if (city.id == default(int) && cityCache.Count(x => Like(x.name, cityString ) || Like(x.altName, cityString )) == 1)
        city = cityCache.FirstOrDefault(x => Like(x.name, cityString ) || Like(x.altName, cityString ));
}

if (city.id == default(int))
{
    //city not matched
}

这对于大量记录来说非常慢，因为我也以同样的方式检查机场和国家等其他对象。有什么方法可以加快速度吗？这种比较比List＆lt;＆gt;有更快的收集，并且FirsOrDefault（）有更快的比较功能吗？

修改

我忘了发布我的Like（）函数：

bool Like(string s1, string s2)
    {
        if (string.IsNullOrEmpty(s1) || string.IsNullOrEmpty(s2))
            return s1 == s2;
        if (s1.ToLower().Trim() == s2.ToLower().Trim())
            return true;

        return Regex.IsMatch(Regex.Escape(s1.ToLower().Trim()), Regex.Escape(s2.ToLower().Trim()) + ".");
    }

Answer 1

我会为CityString和CountryCode使用HashSet。像

这样的东西

var validCountryCode = new HashSet<string>(StringComparison.OrdinalIgnoreCase);
if (validCountryCode.Contains(city.CountryCode))
{
}

等...

我个人会在构造函数中进行所有验证，以确保只存在有效的City对象。

其他需要注意的表现

如果您在有效列表中查找，请使用HashSet。
在适当的地方使用IEqualityComparer，重用该对象以避免构造/ GC成本。
使用词典查找您需要查找的任何内容（例如timeZoneId）

修改1

你是cityCache可能是这样的，

var cityCache = new Dictionary<string, Dictionary<string, int>>();
var countryCode = "";
var cityCode = "";
var id = x;

public static IsCityValid(City c)
{
     return
         cityCache.ContainsKey(c.CountryCode) &&
         cityCache[c.CountryCode].ContainsKey(c.CityCode) &&
         cityCache[c.CountryCode][c.CityCode] == c.Id;  
}

修改2

没想到我必须解释这一点，但也许是基于评论。

FirstOrDefault()是O（n）操作。基本上每当你试图在列表中找到一个东西时，你可能是幸运的，它是列表中的第一个，或者是不幸的，它是list.Count / 2的最后一个平均值。另一方面，字典将是一个O（1）查找。使用IEqualtiyComparer，它将生成一个HashCode（）并查找它所在的桶。如果只有大量冲突，那么它将使用Equals在同一个桶中的事物列表中查找您所追求的内容。即使质量较差的HashCode（）（总是返回相同的HashCode），因为Dictionary / HashSet使用素数桶，您将拆分列表，减少需要完成的等值数量。 / p>

因此，10个对象的列表意味着您平均运行LIKE 5次。与下面相同的10个对象的字典（取决于HashCode的质量）可能只有一个HashCode()调用，后跟一个Equals()调用。

Answer 2

这听起来像是二叉树的一个很好的候选者。

对于.NET中的二叉树实现，请参阅：Objects that represent trees

修改
如果您想快速搜索集合，并且该集合特别大，那么您最好的选择是对其进行排序并根据该排序实施搜索算法。

当您想要快速搜索并且相对不频繁地插入项目时，二叉树是一个不错的选择。但是，为了保持快速搜索，您需要使用平衡二叉树。

为了使其正常工作，您还需要一个标准密钥才能用于您的城市。数字键最好，但字符串也可以正常工作。如果您将城市与其他信息（例如州和国家/地区）连接在一起，您将获得一个不错的唯一密钥。您还可以将大小写更改为所有大写或小写以获取不区分大小写的键。

如果您没有密钥，则无法对数据进行排序。如果您无法对数据进行排序，那么就不会有很多“快速”选项。

编辑2：
我注意到你的Like函数编辑了很多字符串。编辑字符串是一项非常昂贵的操作。最好不要执行ToLower()和Trim()函数一次，最好是在第一次加载数据时。这可能会大大加快你的功能。

比较c＃中对象的最快方法

2 个答案: