Question

我有一个查询Elasticsearch的Java ETL，将数据打包成逗号分隔的行，将行追加到StringBuilder，在添加更多行时重复压缩（gzip）StringBuilder，直到达到大约9.2 MB，然后提交将压缩的字节流压缩为Salesforce Wave外部数据API作为数据存储，直到提交所有部件并完成事务（其中它成为可在SFDC Wave中可视化的数据）。

我遇到的挑战是，由于我将此代码重构为多线程方法以使其更快，因此出现了一个问题（在多线程之前它运行良好，尽管速度很慢）。特别是，Wave Data Manager偶尔（不一致）会在此代码提交的其中一个数据部分中遇到意外的EOF。

我正在展示我的代码中的关键元素，以便具有更多并发经验的人能够发现我的方法允许迷路EOF进入流程...

线程执行程序的相关部分是

long minMaxDelta = maxResultId - minResultId;
long theIncrementSize = minMaxDelta / ((long)iterations);

long threadIncrement = ((minMaxDelta) / threadSize) + 1;
long theCurrentMax = 0;
long theCurrentMin = minResultId;

for ( int ti= 0; ti < threadSize; ti++ ) {
    theCurrentMax = theCurrentMin + threadIncrement;
    Elk2WaveEtl etlThread = new Elk2WaveEtl(theCurrentMin,  theCurrentMax, 
        incrementSize,  targetDate,  elkPassword, parentID, 
        partnerConnection);
    new Thread(theGroup,etlThread, "elk2wave" + ti).start();
    logger.info("started thread for : " + theCurrentMin + " <= " + theCurrentMax);
     theCurrentMin = theCurrentMax;
}

因此，启动的线程数是一个输入参数（threadSize），其中每个线程都有一系列Elasticsearch ID来查询给定的日期。

当在每个线程内从Elasticsearch检索数据时，使用此函数调用将其打包成伪CSV行

StringBuilder theCsvFile = new StringBuilder();
theCsvFile.append(CSVUtils.makeStringLine(Arrays.asList(testId, 
    autobuildName, changelistOwner, scrumteam, testCategory, 
    bugNumber, depotPath, typeName, lastRunStatus, testOwner, devOwner,
    className, runningTime, isBenchmark, isFailure, changelist, 
    autobuildId, runId, changelistEmail, startDate, status, testName,
    failDate, testIdentifier, isFailure)));

makeStringLine函数的定义如下：

 public static String makeStringLine(List<String> values, char separators, char customQuote) throws IOException {

    boolean first = true;

    //default customQuote is empty

    if (separators == ' ') {
        separators = DEFAULT_SEPARATOR;
    }

    StringBuilder sb = new StringBuilder();
    for (String value : values) {
        if (!first) {
            sb.append(separators);
        }
        if (customQuote == ' ') {
            sb.append(followCVSformat(value));
        } else {
            sb.append(customQuote).append(followCVSformat(value)).append(customQuote);
        }

        first = false;
    }
    sb.append("\n");
    logger.debug(sb.toString() );
    return sb.toString();


}

现在这是并发性相关的地方。由于数据的压缩字节流已提交给Wave外部数据API，因此我有一个同步代码块，因此各个线程在这里不会相互重叠。

private static volatile AtomicInteger p = new AtomicInteger(0);

public int increment() {
    return p.incrementAndGet();
}

......

// write the test info to Wave when we have about 9MB of data
 if ( compressedLength >= 1024*1000*9 ) { // 9 MB 
    byte[] theData = compress(theCsvFile);
    theCsvFile = new StringBuilder();
    compressedLength = 0;

    if ( theData != null && theData.length > 0 ) {
        synchronized(p) {
        SObject isobj = new SObject();
        isobj.setType("InsightsExternalDataPart"); 
        isobj.setField("DataFile", theData);
        isobj.setField("InsightsExternalDataId", parentID);
        isobj.setField("PartNumber",increment()); //Part numbers should start at 1    
        logger.debug(" theRowSize " + theData.length);
        SaveResult[] iresults = partnerConnection.create(new SObject[] { isobj });
        for(SaveResult sv:iresults) {
            if(sv.isSuccess()) {
                String rowId = sv.getId();
                logger.info("saved rowId " + rowId + " for part " + pvalue());
            } else {
                com.sforce.soap.partner.Error[] es = sv.getErrors();
                for ( int w = 0; w < es.length; w++ ) {
                    logger.error(es[w].getMessage());
                }
            }
        } 
     }
  } 
 }

定义了压缩函数：

public static byte[] compress(StringBuilder data) throws IOException {
    ByteArrayOutputStream bos = new ByteArrayOutputStream(data.toString().length());
    GZIPOutputStream gzip = new GZIPOutputStream(bos);
    gzip.write(data.toString().getBytes(StandardCharsets.UTF_8));
    gzip.close();
    byte[] compressed = bos.toByteArray();
    bos.close();
    return compressed;
}

内存gzip创建和上传的Java并发

0 个答案: