Question

在获取数据集的头部时，出现错误消息：

java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 60, Column 32: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 60, Column 32: A method named "toString" is not declared in any enclosing class nor any supertype, nor through a static import

我正在代码中做一个简单的联接，然后试图获得成功：

Dataset<Transaction> ds = getSparkSession().read().text(file).map(Row::mkString, Encoders.STRING())                            
                        .map(row -> {
                            return Transaction.builder()
                                    .account(row.substring(7, 19).trim())                                  
                                    .referenceNumber(row.substring(58, 69).trim())
                                    .dateAndTime(row.substring(71, 79).trim())
                                    .amount(row.substring(87, 100).trim())                                    
                                    .merchantCity(row.substring(160, 173).trim())
                                    .merchantCountry(row.substring(173, 175).trim())                                    
                                    .build();
                        }, Encoders.bean(CreditCardTransaction.class)));

Dataset<User> userDs = getUserDs();

Dataset<FilteredTransaction> fds = ds.filter(functions.length(ds.col("account")).geq("16"))
                        .join(userDs, ds.col("referenceNumber").startsWith(userDs.col("referenceNumber")))                      
                        .select(userDs.col("userId"),
                                ds.col("amount"),
                                ds.col("dateAndTime").cast(DataTypes.TimestampType),
                                ds.col("account"),
                                ds.col("merchantCity"),
                                ds.col("merchantCountry"))
                        .as(Encoders.bean(FilteredTransaction.class))
        );

fds.head(1);

当我查看生成的代码时，发现它在下面的第60行的长原语上执行toString，这是Bug吗？

/* 050 */     boolean isNull21 = i.isNullAt(2);
/* 051 */     long value21 = isNull21 ? -1L : (i.getLong(2));
/* 052 */     boolean isNull20 = true;
/* 053 */     java.lang.String value20 = null;
/* 054 */     if (!isNull21) {
/* 055 */
/* 056 */       isNull20 = false;
/* 057 */       if (!isNull20) {
/* 058 */
/* 059 */         Object funcResult9 = null;
/* 060 */         funcResult9 = value21.toString();

Answer 1

原因可能是Dataset bean类中的Encoder列数据类型之一与相应的字段数据类型不匹配。

例如，FilteredTransaction的字段String的类型为account。在源文本文件中，它是一个数字（将被处理很长时间）。在这种情况下，只要没有long方法，就无法将String转换为toString。因此，请为Encoder bean中的字段设置相同的数据类型（如在推断的数据集模式中一样）。

class FilteredTransaction {
    ...
    private long account;
    ....
}

apache spark 2.2没有可用的toString方法

1 个答案: