如何将整列的大小写改为小写?

时间:2017-04-19 16:06:17

标签: apache-spark apache-spark-sql spark-dataframe apache-spark-dataset

我想在Spark数据集中将整列的大小写更改为小写

        Desired Input
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|BRUSH & BROOM HAN...|
        |   XYZ|WHEEL BRUSH PARTS...|
        +------+--------------------+

        Desired Output
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+

我尝试使用collectAsList()和toString(),这对于非常大的数据集来说是一个缓慢而复杂的过程。

我还发现了一种方法'较低',但没有知道如何让它在dasaset中运行  请建议我一个简单或有效的方法来做到这一点。提前致谢

3 个答案:

答案 0 :(得分:15)

我知道了(使用Functions#lower,请参阅Javadoc

import org.apache.spark.sql.functions.lower

        String columnName="Category name";
        src=src.withColumn(columnName, lower(col(columnName)));
        src.show();

这个旧列替换了新列,保留了整个数据集。

        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+

答案 1 :(得分:13)

使用lower

中的org.apache.spark.sql.functions功能

例如:

df.select($"q1Content", lower($"q1Content")).show

输出。

+--------------------+--------------------+
|           q1Content|    lower(q1Content)|
+--------------------+--------------------+
|What is the step ...|what is the step ...|
|What is the story...|what is the story...|
|How can I increas...|how can i increas...|
|Why am I mentally...|why am i mentally...|
|Which one dissolv...|which one dissolv...|
|Astrology: I am a...|astrology: i am a...|
| Should I buy tiago?| should i buy tiago?|
|How can I be a go...|how can i be a go...|
|When do you use  ...|when do you use  ...|
|Motorola (company...|motorola (company...|
|Method to find se...|method to find se...|
|How do I read and...|how do i read and...|
|What can make Phy...|what can make phy...|
|What was your fir...|what was your fir...|
|What are the laws...|what are the laws...|
|What would a Trum...|what would a trum...|
|What does manipul...|what does manipul...|
|Why do girls want...|why do girls want...|
|Why are so many Q...|why are so many q...|
|Which is the best...|which is the best...|
+--------------------+--------------------+

答案 2 :(得分:1)

首先,您应该添加库

import static org.apache.spark.sql.functions.lower;

然后您需要将lower方法放在正确的位置。这是一个例子:

.and(lower(df1.col("field_name")).equalTo("offeringname"))

我已经在这里阅读了所有答案,然后自己尝试了一下,由于某种原因,我在IntelliJ Idea上呆了几分钟,直到我理解了(图书馆方面)。如果您遇到此故障,只需根据IntelliJ的建议添加该库,因为它在未知内容时会弹出。

祝你好运。