Question

我有一个班级，其概要基本列在下面。

import org.apache.commons.math.stat.Frequency;
public class WebUsageLog {
    private Collection<LogLine> logLines;
    private Collection<Date> dates;

    WebUsageLog() {
        this.logLines = new ArrayList<LogLine>();
        this.dates = new ArrayList<Date>();
    }

    SortedMap<Double, String> getFrequencyOfVisitedSites() {
        SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
        Collection<String> domains = new HashSet<String>();
        Frequency freq = new Frequency();
        for (LogLine line : this.logLines) {
            freq.addValue(line.getVisitedDomain());
            domains.add(line.getVisitedDomain());
        }

        for (String domain : domains) {
            frequencyMap.put(freq.getPct(domain), domain);
        }

        return frequencyMap;
    }
}

此应用程序的目的是允许我们的人力资源人员查看我们发送给他们的Web使用日志。但是，我确信随着时间的推移，我希望能够提供的选项不仅可以查看访问过的站点的频率，还可以查看LogLine的其他成员（例如指定类别的频率，访问类型[文本] / html，img / jpeg等...]过滤判决，等等）。理想情况下，我想避免为每种类型编写单独的数据编译方法，并且它们最终看起来几乎与getFrequencyOfVisitedSites()方法完全相同。

所以，我的问题是双重的：首先，从机械的角度来看，你能看到应该改进这种方法的地方吗？其次，你如何使这个方法更通用，以便它可以处理任意数据集？

Answer 1

这与Eugene的解决方案基本相同，我只是将所有频率计算内容保留在原始方法中，并仅使用该策略来使字段工作。

如果您不喜欢枚举，那么您当然可以使用界面来执行此操作。

public class WebUsageLog {
    private Collection<LogLine> logLines;
    private Collection<Date> dates;

    WebUsageLog() {
        this.logLines = new ArrayList<LogLine>();
        this.dates = new ArrayList<Date>();
    }

    SortedMap<Double, String> getFrequency(LineProperty property) {
        SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
        Collection<String> values = new HashSet<String>();
        Frequency freq = new Frequency();
        for (LogLine line : this.logLines) {
            freq.addValue(property.getValue(line));
            values.add(property.getValue(line));
        }

        for (String value : values) {
            frequencyMap.put(freq.getPct(value), value);
        }

        return frequencyMap;
    }

    public enum LineProperty {
        VISITED_DOMAIN {
            @Override
            public String getValue(LogLine line) {
                return line.getVisitedDomain();
            }
        },
        CATEGORY {
            @Override
            public String getValue(LogLine line) {
                return line.getCategory();
            }
        },
        VERDICT {
            @Override
            public String getValue(LogLine line) {
                return line.getVerdict();
            }
        };

        public abstract String getValue(LogLine line);
    }
}

然后给出一个WebUsageLog实例，您可以这样调用它：

WebUsageLog usageLog = ...
SortedMap<Double, String> visitedSiteFrequency = usageLog.getFrequency(VISITED_DOMAIN);
SortedMap<Double, String> categoryFrequency = usageLog.getFrequency(CATEGORY);

Answer 2

我为每种计算类型引入了类似“数据处理器”的抽象，因此您可以为每一行调用单独的处理器：

...
       void process(Collection<Processor> processors) {
            for (LogLine line : this.logLines) {
                for (Processor processor : processors) {
                    processor.process();
                }
            }

            for (Processor processor : processors) {
                processor.complete();
            }
        }
...

public interface Processor {
   public void process(LogLine line);
   public void complete();
}

public class FrequencyProcessor implements Processor {
    SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
    Collection<String> domains = new HashSet<String>();
    Frequency freq = new Frequency();

    public void process(LogLine line)
        String property = getProperty(line);
        freq.addValue(property);
        domains.add(property);
    }

    protected String getProperty(LogLine line) {
        return line.getVisitedDomain();
    }

    public void complete()
       for (String domain : domains) {
          frequencyMap.put(freq.getPct(domain), domain);
       }
    }
}

您还可以将LogLine API更改为更像Map，即代替强类型line.getVisitedDomain（）可以使用line.get（“VisitedDomain”），然后您可以为所有属性编写通用的FrequencyProcessor在其构造函数中传递属性名称。

我的方法太具体了。我怎样才能使它更通用？

2 个答案: