如何使用Spark-Scala从网上下载CSV文件?

时间:2016-09-25 08:24:12

标签: scala csv apache-spark

世界,

如何使用Spark-Scala从网上下载CSV文件并将文件加载到spark-csv DataFrame中?

目前我依赖shell命令中的curl来获取我的CSV文件。

以下是我想要增强的语法:

/* fb_csv.scala
This script should load FB prices from Yahoo.

Demo:
spark-shell -i fb_csv.scala
*/

// I should get prices:
import sys.process._
"/usr/bin/curl -o /tmp/fb.csv http://ichart.finance.yahoo.com/table.csv?s=FB"!

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

val fb_df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("/tmp/fb.csv")

fb_df.head(9)

我想增强上面的脚本,所以它是纯Scala,里面没有shell语法。

2 个答案:

答案 0 :(得分:3)

DECLARE @cols AS NVARCHAR(MAX),
    @query  AS NVARCHAR(MAX)

select @cols = STUFF((SELECT ',' + QUOTENAME(CName) 
                    from Absence_Summary_tbl
                    group by CName
                    order by CName
            FOR XML PATH(''), TYPE
            ).value('.', 'NVARCHAR(MAX)') 
        ,1,1,'')

set @query = 'SELECT SName,' + @cols + ' from 
             (
                select SName, CName, Sum_Abs
                from @SummaryTable
            ) x
            pivot 
            (
                sum(Sum_Abs)
                for CName in (' + @cols + ')
            ) p '

execute(@query);

答案 1 :(得分:1)

Process CSV from REST API into Spark

找到了更好的答案

您在这里:

Alabama State Senate District 20
Linda Coleman-Madison
Democratic 
2006