我们有四列。我们需要根据我们的计算距离平均值 需求。需要编写spark scala代码并存储在 数据框
country,state,speed,time
c1,s1,25kph,8h
c1,s2,5kph,12h
c2,s3,35kph,9h
c2,s5,53kph,7.5h
c3,s5,82kph,8h
c4,s6,35kph,7h
c5,s7,95kph,6h
c2,s3,65kph,11h
c1,s2,8kph,32h
像这样,我们在CSV文件中有1000个不同的行
我们需要根据我们的计算距离平均值 需求。需要编写spark scala代码并存储在 编写一个scala scala代码以使用速度查找距离 给定国家和州的时间(从c2,s3到c4 s6)一次或 从任何起点到另一个终点,需要计算 从c4,s6到c2,s3的距离(然后完成一个回合 旅行)
答案 0 :(得分:0)
首先,将按照给定格式将要复制的数据复制为文本文件,例如timedist.txt
,并假设该文件位于scala的当前目录中,下面的正则表达式用于将文件读入{{1} }(类型为df
的{{1}})。
List of tuples
现在,此函数将(String,String,Double,Double,Double) along with distances calculated for each record
作为参数以及代表val p = """(.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h""".r
val td = scala.io.Source.fromFile("timedist.txt").getLines.map(_.trim()).toList
val df = td.map(x=>{var p(c,s,speed,distance)=x;
(c,s,speed.toDouble,distance.toDouble,speed.toDouble*distance.toDouble)})
和df
(Int
)的两个fromRecord
参数,并计算双向距离(toRecord
。
fromRecord < toRecord
在Scala REPL中:
roundtrip
现在,这将与 def calcDist(fromRecord:Int,toRecord:Int,df:List[(String,String,Double,Double,Double)]) = {
df.map(_._5).zipWithIndex.filter(x=>x._2>=(fromRecord-1) && x._2<=(toRecord-1)).map(_._1).sum*2
}
相距scala> val p = """(.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h""".r
p: scala.util.matching.Regex = (.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h
scala> var td = scala.io.Source.fromFile("timedist.txt").getLines.map(_.trim()).toList
td: List[String] = List(c1,s1,25kph,8h, c1,s2,5kph,12h, c2,s3,35kph,9h, c2,s5,53kph,7.5h, c3,s5,82kph,8h, c4,s6,35kph,7h, c5
,s7,95kph,6h, c2,s3,65kph,11h, c1,s2,8kph,32h)
scala> val df = td.map(x=>{var p(c,s,speed,distance)=x;
(c,s,speed.toDouble,distance.toDouble,speed.toDouble*distance.toDouble)})
df: List[(String, String, Double, Double, Double)] = List((c1,s1,25.0,8.0,200.0), (c1,s2,5.0,12.0,60.0), (c2,s3,35.0,9.0,315
.0), (c2,s5,53.0,7.5,397.5), (c3,s5,82.0,8.0,656.0), (c4,s6,35.0,7.0,245.0), (c5,s7,95.0,6.0,570.0), (c2,s3,65.0,11.0,715.0)
, (c1,s2,8.0,32.0,256.0))
。
roundtrip
如果列表3rd record to 6th record
要转换为数据帧;
scala> calcDist(3,6,df)
res139: Double = 3227.0
scala>
在scala REPL中:
df
在这种情况下,有必要找出从df.toDF("Country", "State", "speed","time","distance")
到scala> df.toDF("Country", "State", "speed","time","distance").show
+-------+-----+-----+----+--------+
|Country|State|speed|time|distance|
+-------+-----+-----+----+--------+
| c1| s1| 25.0| 8.0| 200.0|
| c1| s2| 5.0|12.0| 60.0|
| c2| s3| 35.0| 9.0| 315.0|
| c2| s5| 53.0| 7.5| 397.5|
| c3| s5| 82.0| 8.0| 656.0|
| c4| s6| 35.0| 7.0| 245.0|
| c5| s7| 95.0| 6.0| 570.0|
| c2| s3| 65.0|11.0| 715.0|
| c1| s2| 8.0|32.0| 256.0|
+-------+-----+-----+----+--------+
的距离,在转换后的{{1 }}最好。首先,列表one record
的转换如下:
another record
然后在Scala REPL中:
record column
现在,first item
将如下所示:
dataframe
以及在Scala REPL中:
df