编写scala代码以计算到给定点的距离

时间:2018-08-02 05:39:43

标签: scala apache-spark

我们有四列。我们需要根据我们的计算距离平均值  需求。需要编写spark scala代码并存储在  数据框

 country,state,speed,time
  c1,s1,25kph,8h
  c1,s2,5kph,12h
  c2,s3,35kph,9h
  c2,s5,53kph,7.5h
  c3,s5,82kph,8h
  c4,s6,35kph,7h
  c5,s7,95kph,6h
  c2,s3,65kph,11h
  c1,s2,8kph,32h

像这样,我们在CSV文件中有1000个不同的行

我们需要根据我们的计算距离平均值      需求。需要编写spark scala代码并存储在      编写一个scala scala代码以使用速度查找距离      给定国家和州的时间(从c2,s3到c4 s6)一次或      从任何起点到另一个终点,需要计算      从c4,s6到c2,s3的距离(然后完成一个回合      旅行)

1 个答案:

答案 0 :(得分:0)

首先,将按照给定格式将要复制的数据复制为文本文件,例如timedist.txt,并假设该文件位于scala的当前目录中,下面的正则表达式用于将文件读入{{1} }(类型为df的{​​{1}})。

List of tuples

现在,此函数将(String,String,Double,Double,Double) along with distances calculated for each record作为参数以及代表val p = """(.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h""".r val td = scala.io.Source.fromFile("timedist.txt").getLines.map(_.trim()).toList val df = td.map(x=>{var p(c,s,speed,distance)=x; (c,s,speed.toDouble,distance.toDouble,speed.toDouble*distance.toDouble)}) dfInt)的两个fromRecord参数,并计算双向距离(toRecord

fromRecord < toRecord

在Scala REPL中:

roundtrip

现在,这将与 def calcDist(fromRecord:Int,toRecord:Int,df:List[(String,String,Double,Double,Double)]) = { df.map(_._5).zipWithIndex.filter(x=>x._2>=(fromRecord-1) && x._2<=(toRecord-1)).map(_._1).sum*2 } 相距scala> val p = """(.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h""".r p: scala.util.matching.Regex = (.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h scala> var td = scala.io.Source.fromFile("timedist.txt").getLines.map(_.trim()).toList td: List[String] = List(c1,s1,25kph,8h, c1,s2,5kph,12h, c2,s3,35kph,9h, c2,s5,53kph,7.5h, c3,s5,82kph,8h, c4,s6,35kph,7h, c5 ,s7,95kph,6h, c2,s3,65kph,11h, c1,s2,8kph,32h) scala> val df = td.map(x=>{var p(c,s,speed,distance)=x; (c,s,speed.toDouble,distance.toDouble,speed.toDouble*distance.toDouble)}) df: List[(String, String, Double, Double, Double)] = List((c1,s1,25.0,8.0,200.0), (c1,s2,5.0,12.0,60.0), (c2,s3,35.0,9.0,315 .0), (c2,s5,53.0,7.5,397.5), (c3,s5,82.0,8.0,656.0), (c4,s6,35.0,7.0,245.0), (c5,s7,95.0,6.0,570.0), (c2,s3,65.0,11.0,715.0) , (c1,s2,8.0,32.0,256.0))

roundtrip

如果列表3rd record to 6th record要转换为数据帧;

scala> calcDist(3,6,df)
res139: Double = 3227.0

scala>

在scala REPL中:

df

在这种情况下,有必要找出从df.toDF("Country", "State", "speed","time","distance") scala> df.toDF("Country", "State", "speed","time","distance").show +-------+-----+-----+----+--------+ |Country|State|speed|time|distance| +-------+-----+-----+----+--------+ | c1| s1| 25.0| 8.0| 200.0| | c1| s2| 5.0|12.0| 60.0| | c2| s3| 35.0| 9.0| 315.0| | c2| s5| 53.0| 7.5| 397.5| | c3| s5| 82.0| 8.0| 656.0| | c4| s6| 35.0| 7.0| 245.0| | c5| s7| 95.0| 6.0| 570.0| | c2| s3| 65.0|11.0| 715.0| | c1| s2| 8.0|32.0| 256.0| +-------+-----+-----+----+--------+ 的距离,在转换后的{{1 }}最好。首先,列表one record的转换如下:

another record

然后在Scala REPL中:

record column

现在,first item将如下所示:

dataframe

以及在Scala REPL中:

df