Question

在spark数据框（Java API 2.2版）中，我试图获取列的子字符串如下：

//aggregationDS is a spark dataset
aggregationsDS = aggregationsDS.withColumn("NODE_ID", aggregationsDS.col("NODE_ID").substr(2, [*Lengthofcolumn*]));

我需要为该特定列提供字符串的长度，但不确定正确的命令是什么。

Answer 1

您可以使用expr：

aggregationsDS.withColumn("NODE_ID", expr("substr(NODE_ID, 2)") );

或

aggregationsDS.withColumn("NODE_ID", expr("substr(NODE_ID, 2, length(NODE_ID))") );

获取dataframe列的子字符串

1 个答案: