Question

我创建了一个算法，该算法会根据与文章属性相关的两个关键字列表来衡量文章列表的相关性。

效果很好而且效率很高......但是它很乱。它不是非常易读，因此很难分辨出发生了什么。

伪代码中的操作如下：

遍历名为文章(List<Article>)
对于每篇文章，循环遍历角色列表(List<string>)
检查当前文章是否包含任何角色(Article.Roles = List<string>)
如果是，则循环浏览文章中的每个角色，并尝试将文章中的角色与当前循环中的角色相匹配
如果找到匹配项，请为文章添加重量。如果文章中的角色索引和角色列表中的角色都是索引0（在主要位置），则为两个匹配的原色添加额外的权重
重复主题，但主要比赛没有奖金

编写以下代码的更好方法是什么？除了在一两个地方之外，我不能使用foreach，因为我需要匹配索引以了解要在匹配中添加的值。

private static List<Article> WeighArticles(List<Article> articles, List<string> roles, List<string> topics, List<string> industries)
{
    var returnList = new List<Article>();
    for (int currentArticle = 0; currentArticle < articles.Count; currentArticle++)
    {
        for (int currentRole = 0; currentRole < roles.Count; currentRole++)
        {
            if (articles[currentArticle].Roles != null && articles[currentArticle].Roles.Count > 0)
            {
                for (int currentArticleRole = 0; currentArticleRole < articles[currentArticle].Roles.Count; currentArticleRole++)
                {
                    if (articles[currentArticle].Roles[currentArticleRole].ToLower() == roles[currentRole].ToLower())
                    {
                        if (currentArticleRole == 0 && currentRole == 0)
                            articles[currentArticle].Weight += 3;
                        else
                            articles[currentArticle].Weight += 1;
                    }
                }
            }
        }
        for (int currentTopic = 0; currentTopic < topics.Count; currentTopic++)
        {
            if (articles[currentArticle].Topics != null && articles[currentArticle].Topics.Count > 0)
            {
                for (int currentArticleTopic = 0; currentArticleTopic < articles[currentArticle].Topics.Count; currentArticleTopic++)
                {
                    if (articles[currentArticle].Topics[currentArticleTopic].ToLower() == topics[currentTopic].ToLower())
                    {
                        articles[currentArticle].Weight += 0.8;
                    }
                }
            }
        }
        returnList.Add(articles[currentArticle]);
    }

    return returnList;
}

//Article Class stub (unused properties left out)
public class Article
{
    public List<string> Roles { get; set; }
    public List<string> Topics { get; set; }
    public double Weight { get; set; }
}

Answer 1

好的，您的代码中存在一些设计缺陷：

1 - 它太程序化了。你需要学会思考编写代码来告诉机器＆＃34;你想要什么＆＃34; 而不是＆＃34;怎么做＆＃34; ，类似于前往酒吧并向酒保指导所有事物的确切比例而不仅仅是要求喝酒的类比。

2 - Collections Should NEVER be null。这意味着检查articles[x].Roles != null毫无意义。

3 - 在List<string>上进行迭代并将每个与someOtherString进行比较也没有任何意义。请改用List<T>.Contains()。

4 - 您正在抓取输入列表中的每个项目并将其输出到新列表中。也胡说八道。直接返回输入列表或使用inputList.ToList()

创建新列表

总而言之，这是一种更为惯用的C＃编写代码的方式：

private static List<Article> WeighArticles(List<Article> articles, List<string> roles, List<string> topics, List<string> industries)
{
    var firstRole = roles.FirstOrDefault();

    var firstArticle = articles.FirstOrDefault();

    var firstArticleRole = firstArticle.Roles.FirstOrDefault();

    if (firstArticleRole != null && firstRole != null && 
        firstRole.ToLower() == firstArticleRole.ToLower())
        firstArticle.Weight += 3;

    var remaining = from a in articles.Skip(1)
                    from r in roles.Skip(1)
                    from ar in a.Roles.Skip(1)
                    where ar.ToLower() == r.ToLower()
                    select a;

    foreach (var article in remaining)
        article.Weight += 1;

    var hastopics = from a in articles
                    from t in topics
                    from at in a.Topics
                    where at.ToLower() == t.ToLower()
                    select a;

    foreach (var article in hastopics)
        article.Weight += .8;

    return articles;
}

还有更好的方式来撰写此内容，例如使用.Take(1)代替.FirstOrDefault()

Answer 2

如果您要检查您的代码，您会发现您多次向Article类询问数据。使用Tell, Don't Ask原则并将权重添加逻辑移到Article类，它应该属于它。这将增加文章的凝聚力，并使您的原始代码更具可读性。以下是原始代码的外观：

 foreach(var article in articles)
 {
     article.AddWeights(roles);
     article.AddWeights(topics);
 }

文章看起来像：

 public double Weight { get; private set; } // probably you don't need setter

 public void AddWeights(IEnumerable<Role> roles)
 {
     const double RoleWeight = 1;
     const double PrimaryRoleWeight = 3;

     if (!roles.Any())
        return;

     if (Roles == null || !Roles.Any())
         return;

     var pirmaryRole = roles.First();
     var comparison = StringComparison.CurrentCultureIgnoreCase;

     if (String.Equals(Roles[0], primaryRole, comparison))
     {
         Weight += PrimaryRoleWeight;
         return;
     }

     foreach(var role in roles)         
        if (Roles.Contains(role, StringComparer.CurrentCultureIgnoreCase))
            Weight += RoleWeight;
 }

添加主题权重：

 public void AddWeights(IEnumerable<Topic> topics)
 {
     const double TopicWeight = 0.8;

     if (Topics == null || !Topics.Any() || !topics.Any())
        return;

     foreach(var topic in topics)         
        if (Topics.Contains(topic, StringComparer.CurrentCultureIgnoreCase))
            Weight += TopicWeight;
 }

Answer 3

对每个for循环使用Extract Method重构，并为其赋予一个语义名称WeightArticlesForRole，WeightArticlesForTopic等。这将消除嵌套循环（它们仍然存在但通过列表中的函数调用传递）。

它还将使您的代码自我记录并且更具可读性，因为现在您已经将循环简化为一个反映其完成内容的命名方法。那些阅读代码的人最有兴趣首先理解它完成的内容，然后再尝试理解如何实现。语义/概念函数名称将促进这一点。他们可以使用GoTo定义来确定之后的。为每个方法提供摘要标记注释，并详细说明（类似于您的伪代码），现在其他人可以围绕您的代码进行操作，而不必费力地阅读他们不关心实现细节的代码。

重构的方法可能会有一些看起来很脏的参数，但它们将是私有方法，因此我通常不担心这一点。但是，有时它可以帮助我查看应该删除的依赖项，并重新调整调用中的代码，以便可以从多个位置重用它。我怀疑有一些params用于加权和委托函数，你可以将WeightArticlesForRole和WeightArticlesForTopic组合成一个函数，以便在两个地方重用。

更高效和可读的嵌套循环

3 个答案: