Scala-将数据帧写入csv时如何将定界符作为变量传递

时间:2018-08-24 02:16:44

标签: scala csv dataframe export delimiter

使用变量作为dataframe.write.csv的分隔符不起作用。尝试替代方案太复杂了。

 val df = Seq(("a", "b", "c"), ("a1", "b1", "c1")).toDF("A", "B", "C")
 val delim_char = "\u001F"

 df.coalesce(1).write.option("delimiter", delim_char).csv("file:///var/tmp/test")  // Does not work -- error related to too many chars
 df.coalesce(1).write.option("delimiter", "\u001F").csv("file:///var/tmp/test")  //works fine...

我已经尝试过.toHexString和许多其他替代方法...

2 个答案:

答案 0 :(得分:0)

您的声明效果很好。当您提供直接字符串值或传递引用变量时,它都适用。仅当将定界符值括在单引号'\u001F'中时,才会出现字符长度错误。与Scala 2.11.8无关。

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://xx.x.xxx.xx:xxxx
Spark context available as 'sc' (master = local[*], app id = local-1535083313716).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0.2.6.3.0-235
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import java.io.File
import java.io.File

scala> import org.apache.spark.sql.{Row, SaveMode, SparkSession}
import org.apache.spark.sql.{Row, SaveMode, SparkSession}

scala> val warehouseLocation = new File("spark-warehouse").getAbsolutePath
warehouseLocation: String = /usr/hdp/2.6.3.0-235/spark2/spark-warehouse

scala> val spark = SparkSession.builder().appName("app").config("spark.sql.warehouse.dir", warehouseLocation).enableHiveSupport().getOrCreate()
18/08/24 00:02:25 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@37d3e740

scala> import spark.implicits._
import spark.implicits._

scala> import spark.sql
import spark.sql

scala> val df = Seq(("a", "b", "c"), ("a1", "b1", "c1")).toDF("A", "B", "C")
df: org.apache.spark.sql.DataFrame = [A: string, B: string ... 1 more field]

scala> val delim_char = "\u001F"
delim_char: String = ""

scala> df.coalesce(1).write.option("delimiter", delim_char).csv("file:///var/tmp/test")

scala>

答案 1 :(得分:0)

感谢您的帮助。

上面的代码在经过测试后可以正常工作,我无法找到一种方法来展示问题是如何产生的。但是,问题在于,从csv文件中收集了一个变量后,该变量分配给了一个字符串(Unicode为“ \ u001F”,println将结果显示为字符串:\ u001F)。

尝试了几种方法。终于在另一个Stackoverflow question related to string unicode ...

中找到了解决方案

1)无效-delim_char.format(“ unicode-escape”)

2)工作-

import React from "react";
import ReactDOM from "react-dom";
import SearchComponent from "./components/Search_component";
import ListViewComponent from "./components/Listview_component";
import _ from "lodash";
const axios = require("axios");
const api_key = "9f4cd2e5a8884f3eb5853436e74be7e6";
let url = "https://api.nytimes.com/svc/search/v2/articlesearch.json";

class App extends React.Component {
    constructor(props) {
        super(props);
        this.state = {
            q: " "
        };
        this.do_search("Bangladesh");
        this.do_search = this.do_search.bind(this);
    }

    do_search(keyword) {
        axios
            .get(
                url, // takes the variable url
                {
                    params: {
                        api_key: api_key,
                        q: keyword
                    }
                }
            )
            .then(function(response) {
                console.log(response);
                this.setState({ response }); // SET STATE HERE
            })
            .catch(function(error) {
                console.log(error);
            });
    }

    render() {
        const search_throttle = _.debounce(keyword => {
            this.do_search(keyword);
        }, 500);

        return (
            <div>
                <SearchComponent
                    searchkeyword={
                        search_throttle
                    }
                />
                <ListViewComponent data={this.state.response} /> // GET STATE HERE
            </div>
        );
    }
}

ReactDOM.render(<App />, document.getElementById("root"));