Question

这是针对hive查询中的自定义UDTF，CreateLogTable是UDTF类，我将其用作测试的temp。我正在为每个file to be downloaded from Amazon S3创建一个线程，并等待另一个线程变为可用，然后再将另一个文件分配给线程。

主要测试逻辑：

CreateLogTable CLT = new CreateLogTable();

int numThreads = 2;
int index = 0;
DownloadFileThread[] dlThreads = new DownloadFileThread[numThreads];
for (S3ObjectSummary oSummary : bucketKeys.getObjectSummaries()) {
    while (dlThreads[index] != null && dlThreads[index].isAlive()) {
        index += 1;
        index = index % numThreads;
    }
    dlThreads[index] = new DownloadFileThread(CLT , getBucket(oSummary.getBucketName() + "/"
                    + oSummary.getKey()), getFile(oSummary.getKey()), index);
    dlThreads[index].start();
    index += 1;
    index = index % numThreads;
}

主题类（run()方法）：

try {
    System.out.println("Creating thread " + this.threadnum);
    this.fileObj = this.S3CLIENT.getObject(new GetObjectRequest(this.filePath, this.fileName));
    this.fileIn = new Scanner(new GZIPInputStream(this.fileObj.getObjectContent()));
    while (this.fileIn.hasNext()) {         
        this.parent.forwardToTable(fileIn.nextLine());
    }
    System.out.println("Finished " + this.threadnum);
} catch (Throwable e) {
    System.out.println("Downloading of " + this.fileName + " failed.");
}

线程创建之前的while循环应该循环，直到它找到null thread或dead thread，直到它退出循环，在这种情况下，将创建并启动new thread。由于我将日志记录包含在控制台中，因此我能够观察此过程，但输出是意外的：

Creating thread 0
Creating thread 1
Creating thread 0
Creating thread 1
Creating thread 0
Creating thread 1
Creating thread 0
...
Creating thread 1
Creating thread 0
Creating thread 1
Finished 0
Finished 1
Finished 1
Finished 0
Finished 1
Finished 1
...
Finished 0
Finished 1
Finished 0
Finished 1

以上只是输出的前几行。问题是在任何线程完成任务之前创建了两个以上的线程。

为什么会发生这种情况？我该如何解决这个问题？

Answer 1

尝试看一下这个例子：

public static void main(String[] args) {
        ExecutorService executor = Executors.newFixedThreadPool(5);
        for (int i = 0; i < 10; i++) {
            Runnable worker = new WorkerThread("" + i);
            executor.execute(worker);
          }
        executor.shutdown();
        while (!executor.isTerminated()) {
        }
        System.out.println("Finished all threads");
}

它是一个使用Java 8的线程池。使用Executors制作它的一种非常简单的方法。非常直观的前进方式。

Answer 2

我将代码缩减到了这个测试用例：

public class ThreadTest {
    private static class SleepThread extends Thread {
        private final int index;
        SleepThread(int ii) { index = ii; }

        @Override
        public void run() {
            System.out.println("Creating thread " + this.index);
            try {
                Thread.sleep(5_000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("Finished " + this.index);
        }
    }

    public static void main(String[] args) {
        int numThreads = 2;
        int index = 0;
        SleepThread[] dlThreads = new SleepThread[numThreads];
        for (int ii = 0; ii < 10; ++ii) {
            while (dlThreads[index] != null && dlThreads[index].isAlive()) {
                index += 1;
                index = index % numThreads;
            }
            dlThreads[index] = new SleepThread(index);
            dlThreads[index].start();
            index += 1;
            index = index % numThreads;
        }
    }
}

使用Sun JDK 1.7.0_75，运行它会产生您期望的结果 - 两个线程启动，它们在五秒后退出，另外两个线程启动，依此类推。

我怀疑的下一件事是你的JVM的Thread.isAlive()实现在启动后不会立即返回true，尽管这似乎与Thread类的文档相反。

Answer 3

上述代码无法正常工作的原因是因为调用isAlive()时出现了一些古怪的事情。

出于某种原因，无论线程处于什么状态，isAlive()总是会为我返回false，从而导致创建越来越多的线程，这些线程会替换数组中的旧线程， dlThreads。

我通过创建自定义isWorking()方法解决了这个问题，该方法只返回线程的run()方法是否已完成的布尔值。以下是Thread类现在的样子：

//this.isWorking initialized to true during instantiation

@Override
public void run() {
    try {
        System.out.println("Creating thread " + this.threadnum + " for " + filePath + "/" + fileName);
        this.fileObj = this.S3CLIENT.getObject(new GetObjectRequest(this.filePath, this.fileName));
        this.fileIn = new Scanner(new GZIPInputStream(this.fileObj.getObjectContent()));
        while (this.fileIn.hasNext()) {
            this.parent.forwardToTable(fileIn.nextLine());
        }
        System.out.println("Finished " + this.threadnum);
        this.isWorking = false;
    } catch (Throwable e) {
        System.out.println("Downloading of " + this.fileName + " failed.");
        e.printStackTrace();
        this.isWorking = false;
    }
}

public boolean isWorking(){
    return this.isWorking;
}

然而，在实现这一点并对我的多线程脚本工作满意后，我按照其他用户的建议切换到使用Executor，这略微提高了性能并使代码更加清晰。

为什么这个程序创建的线程多于可能？

3 个答案: