世界,
如何使用Spark-Scala从网上下载CSV文件并将文件加载到spark-csv DataFrame中?
目前我依赖shell命令中的curl来获取我的CSV文件。
以下是我想要增强的语法:
/* fb_csv.scala
This script should load FB prices from Yahoo.
Demo:
spark-shell -i fb_csv.scala
*/
// I should get prices:
import sys.process._
"/usr/bin/curl -o /tmp/fb.csv http://ichart.finance.yahoo.com/table.csv?s=FB"!
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val fb_df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("/tmp/fb.csv")
fb_df.head(9)
我想增强上面的脚本,所以它是纯Scala,里面没有shell语法。
答案 0 :(得分:3)
DECLARE @cols AS NVARCHAR(MAX),
@query AS NVARCHAR(MAX)
select @cols = STUFF((SELECT ',' + QUOTENAME(CName)
from Absence_Summary_tbl
group by CName
order by CName
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set @query = 'SELECT SName,' + @cols + ' from
(
select SName, CName, Sum_Abs
from @SummaryTable
) x
pivot
(
sum(Sum_Abs)
for CName in (' + @cols + ')
) p '
execute(@query);
答案 1 :(得分:1)
从Process CSV from REST API into Spark
找到了更好的答案您在这里:
Alabama State Senate District 20
Linda Coleman-Madison
Democratic
2006