在笔记本中,有一个从对象存储文件中插入代码的选项。但是,当我点击链接时,它会使用一组变量填充笔记本单元格。 E.g。
auth_url : https://identity.open.softlayer.com
project : object_storage_***
project_id : ****
region : dallas
user_id : *****
domain_id : *****
domain_name : *****
username : user_*****
password : *****
filename : block_1.csv
container : notebooks
tenantId : ****
如何在spark命令中使用此信息来加载数据?大概是这样的:
scala> val data = sc.textFile( ... )
问题:确切的命令是什么?
答案 0 :(得分:2)
对象存储“插入代码”选项似乎仅转储到首选项列表中。我想出了这个小scala帮助器来从它转储的字符串中提取属性:
import scala.collection.breakOut
val YOUR_DATASOURCE = """<<paste_your_datasource_attributes_here>>"""
def setConfig(name:String, dsConfiguration:String) : Unit = {
val pfx = "fs.swift.service." + name
val settings:Map[String,String] = dsConfiguration.split("\\n").
map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut)
val conf = sc.getConf
conf.set(pfx + "auth.url", settings.getOrElse("auth_url",""))
conf.set(pfx + "tenant", settings.getOrElse("tenantId", ""))
conf.set(pfx + "username", settings.getOrElse("username", ""))
conf.set(pfx + "password", settings.getOrElse("password", ""))
conf.set(pfx + "apikey", settings.getOrElse("password", ""))
conf.set(pfx + "auth.endpoint.prefix", "endpoints")
}
setConfig("spark", YOUR_DATASOURCE)
将其复制到笔记本中,然后将光标放在多行引号("""
)之间的空行上,然后单击文件的“插入代码”链接。
如果它正常工作,那么您应该能够为您的文件构建swift URL:
val file = sc.textFile("swift://notebooks.spark/TheFileYouClickedOn.txt")
在这种情况下, notebooks 是容器名称, spark 是数据源名称(setConfig函数的第一个参数),followers.txt是文件名我正在使用。
所有这些看起来都像这样:
import scala.collection.breakOut
def setConfig(name:String, dsConfiguration:String) : Unit = {
val pfx = "fs.swift.service." + name
val settings:Map[String,String] = dsConfiguration.split("\\n").
map(l=>(l.split(":",2)(0).trim(), l.split(":",2)(1).trim()))(breakOut)
val conf = sc.getConf
conf.set(pfx + "auth.url", settings.getOrElse("auth_url",""))
conf.set(pfx + "tenant", settings.getOrElse("tenantId", ""))
conf.set(pfx + "username", settings.getOrElse("username", ""))
conf.set(pfx + "password", settings.getOrElse("password", ""))
conf.set(pfx + "apikey", settings.getOrElse("password", ""))
conf.set(pfx + "auth.endpoint.prefix", "endpoints")
}
val YOUR_DATASOURCE = """auth_url : https://identity.open.softlayer.com
project : object_storage_abc123
project_id : abc123abc123abc123abc123abc123
region : dallas
user_id : 123abc123abc123abc123abc123abc
domain_id : a1b2c3a1b2c3a1b2c3a1b2c3a1b2c3
domain_name : 123456
username : user_a1b2c3a1b2c3a1b2c3a1b2c3a1b2c3
password : WhateverPasswordValueGoesHere
filename : TheFileYouClickedOn.txt
container : notebooks
tenantId : a1b2c3-a1b2c3a1b2c3-a1b2c3a1b2c3
"""
setConfig("spark", YOUR_DATASOURCE)
val file = sc.textFile("swift://notebooks.spark/TheFileYouClickedOn.txt")
// Do stuff with your file.
您也可以解析文件名并为您创建textFile引用,但我更喜欢将它们分开,因为您只需要连接到一个ObjectStore即可使用其中的任何文件。它可能也会有一些空行检测等,但是现在我只是自己处理它。
答案 1 :(得分:2)
我在Bluemix中使用带有Scala 2.10内核的Spark as a Service应用程序的Jupyter笔记本。
我可以使用下面的代码访问存储在Swift Object商店中的文件。我认为这方法稍微容易一些,因为我能够在Jupyter笔记本中的对象存储中选择文件,只需使用插入代码功能将代码添加到我的笔记本中而无需修改。这是下面的代码段
def setConfig(credentials : scala.collection.mutable.HashMap[String, String]) = {
val prefix = "fs.swift.service." + credentials("name")
var hconf = sc.getConf
hconf.set(prefix + ".auth.url", credentials("auth_url")+"/v3/auth/tokens")
hconf.set(prefix + ".auth.endpoint.prefix", "endpoints")
hconf.set(prefix + ".tenant", credentials("project_id"))
hconf.set(prefix + ".username", credentials("user_id"))
hconf.set(prefix + ".password", credentials("password"))
hconf.set(prefix + ".http.port", "8080")
hconf.set(prefix + ".region", credentials("region"))
hconf.set(prefix + ".public", "True")
}
var credentials_1 = scala.collection.mutable.HashMap[String, String](
"auth_url"->"https://identity.open.softlayer.com",
"project"->"objexxxxxxxxxxxxx858",
"project_id"->"f4xxxxxxxxxxxxxxa7",
"region"->"dallas",
"user_id"->"e4fc7294xxxxxxx5",
"domain_id"->"7527xxxxxxxxxxx44f",
"domain_name"->"9xxxxx9",
"username"->"Admin_",
"password"->"""xxxxxxxxxxxxx""",
"filename"->"scores.dat",
"container"->"notebooks",
"tenantId"->"s69xxxxxxxxxxxxx4f0"
)
credentials_1("name") = "spark"
setConfig(credentials_1)
val file = sc.textFile("swift://notebooks." + credentials_1("name") + "/" + credentials_1("filename"))
file.take(5)
答案 2 :(得分:1)
请参阅:https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html - =特别是关于“重用现有对象存储...”的部分。你有兴趣从哪个版本的对象存储中消费? (v1,v2,v3或SL OS?)