Question

在以下代码中（从Java Concurrency in Practice第2章，第2.5节，清单2.8中复制）：

@ThreadSafe
public class CachedFactorizer implements Servlet {
    @GuardedBy("this") private BigInteger lastNumber;
    @GuardedBy("this") private BigInteger[] lastFactors;
    @GuardedBy("this") private long hits;
    @GuardedBy("this") private long cacheHits;

    public synchronized long getHits() { return hits; }

    public synchronized double getCacheHitRatio() {
        return (double) cacheHits / (double) hits;
    }

    public void service(ServletRequest req, ServletResponse resp) {
        BigInteger i = extractFromRequest(req);
        BigInteger[] factors = null;
        synchronized (this) {
            ++hits;
            if (i.equals(lastNumber)) {
                ++cacheHits;
                factors = lastFactors.clone(); // questionable line here
            }
        }
        if (factors == null) {
            factors = factor(i);
            synchronized (this) {
                lastNumber = i;
                lastFactors = factors.clone(); // and here
            }
        }
        encodeIntoResponse(resp, factors);
    }
}

克隆factors，lastFactors数组的原因是什么？不能简单地写成factors = lastFactors;和lastFactors = factors;吗？只是因为factors是一个局部变量，然后传递给encodeIntoResponse，可以修改它吗？

希望问题很清楚。感谢。

Answer 1

这称为防御性复制。数组是任何其他对象，所以

 factors = lastFactors

会将lastFactos引用到因子，反之亦然。所以任何人都可以在你的控制之外覆盖你的国家。举个例子：

private void filterAndRemove(BigInteger[] arr);
private void encodeIntoResponse(..., BigInteger[] factors) {
   filterAndRemove(factors);
}

使用我们的理论赋值filterAndRemove也会影响原始的lastFactorials。

Answer 2

从基础知识中回答：如果您计划修改对象，则需要克隆，并且您不想修改原始对象，在您的情况下factors = lastFactors.clone();已完成，因为您不想要{{1}修改它，然后将其克隆并发送到lastFactors，其中可能包含修改它的代码。

Answer 3

克隆数组的唯一原因是阻止（在这种情况下是并发）数组元素的修改。但是，在这种情况下看起来似乎不可能，假设没有其他方法修改lastFactors引用的数组，这对于示例是有意义的。存储在factors和lastFactors中的数组都由factor创建并以完整状态返回，并且它们的引用在同步块内分配，这将使它们安全地发布。

除非encodeIntoResponse修改其factors参数的元素，否则我认为调用clone是不必要的。

Answer 4

我同意本书的作者可以更好地解释本书的这一部分。

确实，为了正确实现线程安全，您必须使用相同的锁同步读取和写入操作;在上面的代码中，为了最小化同步量，作者决定在没有同步的情况下执行encodeIntoResponse(...)：因为encodeIntoResponse(...)方法读取引用的数组的内容通过factors，作者将其克隆到一个新数组中。

注意：虽然factors确实是局部变量，但通常仍然需要克隆它，因为同步和非同步代码读取相同的数组，如果我们通过了引用（不克隆）到lastFactors和encodeIntoResponse(...)。

但是，正如@khachik在问题中正确指出的那样，@ david-harkness在回复中正确指出，在这种特定情况下，clone调用是不必要的，因为lastFactors已安全发布，并且在发布后不会被修改。

Answer 5

如果将dfB更改为import org.apache.spark.sql.functions._ import spark.implicits._ // assuming "spark" is your SparkSession dfA.join(dfB, $"num".between($"numStart", $"numEnd"), "left") .withColumn("Flag", coalesce($"include", lit(0))) .drop(dfB.columns: _*) .show() // +----+------+-----+--------------------+----+ // | num| food|price| timestamp|Flag| // +----+------+-----+--------------------+----+ // |1275|tomato| 1.99|2018-07-21T00:00:...| 0| // | 145|carrot| 0.45|2018-07-21T00:00:...| 1| // |2678| apple| 0.99|2018-07-21T01:00:...| 1| // |6578|banana| 1.29|2018-07-20T01:11:...| 0| // |1001| taco| 2.59|2018-07-21T01:00:...| 0| // +----+------+-----+--------------------+----+，则factors = lastFactors.clone();和factors = lastFactors;都指向同一个对象，factors不再局部变量，它将变为共享的可变状态。

想象一下，有三个请求，即请求A，B，C。请求A和B发送的数字为10，但是请求C发送的数字为20。如果发生以下执行顺序并且您进行了更改，则可能会出错。 lastFactors至factors。

servlet服务器接收请求A，执行整个factors = lastFactors.clone();方法，现在factors = lastFactors;是service，lastNumber是10。
servlet服务器同时接收请求B和C，首先处理请求B，但是在退出第一个lastFactors块之后（现在对于请求B，[1, 2, 5, 10]是synchronized ，是正确的）。
对于请求C，将执行整个factors方法，它将[1, 2, 5, 10]从service更改为lastFactors ，因为两者{{1 }} [1, 2, 5, 10]指向同一对象，[1, 2, 4, 5, 10, 20]现在也为factors。 请求B的响应应该为lastFactors，但现在为factors 。

“实践中的Java并发” - 缓存的线程安全数字因子（清单2.8）

5 个答案: