Question

我们有四列。我们需要根据我们的计算距离平均值需求。需要编写spark scala代码并存储在数据框

 country,state,speed,time
  c1,s1,25kph,8h
  c1,s2,5kph,12h
  c2,s3,35kph,9h
  c2,s5,53kph,7.5h
  c3,s5,82kph,8h
  c4,s6,35kph,7h
  c5,s7,95kph,6h
  c2,s3,65kph,11h
  c1,s2,8kph,32h

像这样，我们在CSV文件中有1000个不同的行

我们需要根据我们的计算距离平均值需求。需要编写spark scala代码并存储在编写一个scala scala代码以使用速度查找距离给定国家和州的时间（从c2，s3到c4 s6）一次或从任何起点到另一个终点，需要计算从c4，s6到c2，s3的距离（然后完成一个回合旅行）

Answer 1

首先，将按照给定格式将要复制的数据复制为文本文件，例如timedist.txt，并假设该文件位于scala的当前目录中，下面的正则表达式用于将文件读入{{1} }（类型为df的{{1}}）。

List of tuples

现在，此函数将(String,String,Double,Double,Double) along with distances calculated for each record作为参数以及代表val p = """(.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h""".r val td = scala.io.Source.fromFile("timedist.txt").getLines.map(_.trim()).toList val df = td.map(x=>{var p(c,s,speed,distance)=x; (c,s,speed.toDouble,distance.toDouble,speed.toDouble*distance.toDouble)})和df（Int）的两个fromRecord参数，并计算双向距离（toRecord。

fromRecord < toRecord

在Scala REPL中：

roundtrip

现在，这将与def calcDist(fromRecord:Int,toRecord:Int,df:List[(String,String,Double,Double,Double)]) = { df.map(_._5).zipWithIndex.filter(x=>x._2>=(fromRecord-1) && x._2<=(toRecord-1)).map(_._1).sum*2 }相距scala> val p = """(.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h""".r p: scala.util.matching.Regex = (.{2}),(.{2}),(\d+\.?\d*)kph,(\d+\.?\d*)h scala> var td = scala.io.Source.fromFile("timedist.txt").getLines.map(_.trim()).toList td: List[String] = List(c1,s1,25kph,8h, c1,s2,5kph,12h, c2,s3,35kph,9h, c2,s5,53kph,7.5h, c3,s5,82kph,8h, c4,s6,35kph,7h, c5 ,s7,95kph,6h, c2,s3,65kph,11h, c1,s2,8kph,32h) scala> val df = td.map(x=>{var p(c,s,speed,distance)=x; (c,s,speed.toDouble,distance.toDouble,speed.toDouble*distance.toDouble)}) df: List[(String, String, Double, Double, Double)] = List((c1,s1,25.0,8.0,200.0), (c1,s2,5.0,12.0,60.0), (c2,s3,35.0,9.0,315 .0), (c2,s5,53.0,7.5,397.5), (c3,s5,82.0,8.0,656.0), (c4,s6,35.0,7.0,245.0), (c5,s7,95.0,6.0,570.0), (c2,s3,65.0,11.0,715.0) , (c1,s2,8.0,32.0,256.0))。

roundtrip

如果列表3rd record to 6th record要转换为数据帧；

scala> calcDist(3,6,df)
res139: Double = 3227.0

scala>

在scala REPL中：

df

在这种情况下，有必要找出从df.toDF("Country", "State", "speed","time","distance")到scala> df.toDF("Country", "State", "speed","time","distance").show +-------+-----+-----+----+--------+ |Country|State|speed|time|distance| +-------+-----+-----+----+--------+ | c1| s1| 25.0| 8.0| 200.0| | c1| s2| 5.0|12.0| 60.0| | c2| s3| 35.0| 9.0| 315.0| | c2| s5| 53.0| 7.5| 397.5| | c3| s5| 82.0| 8.0| 656.0| | c4| s6| 35.0| 7.0| 245.0| | c5| s7| 95.0| 6.0| 570.0| | c2| s3| 65.0|11.0| 715.0| | c1| s2| 8.0|32.0| 256.0| +-------+-----+-----+----+--------+的距离，在转换后的{{1 }}最好。首先，列表one record的转换如下：

another record

然后在Scala REPL中：

record column

现在，first item将如下所示：

dataframe

以及在Scala REPL中：

df

编写scala代码以计算到给定点的距离

1 个答案: