如何在Spark中将数据从垂直扩展到水平?

时间:2019-06-17 09:54:37

标签: scala apache-spark dataframe apache-spark-sql

我有一个文本文件。现在,我想将垂直数据扩展为DefaultTrackableEventHandler.csusing UnityEngine; using Vuforia; /// A custom handler that implements the ITrackableEventHandler interface. /// Changes made to this file could be overwritten when upgrading the Vuforia version. /// When implementing custom event handler behavior, consider inheriting from this class instead. public class DefaultTrackableEventHandler : MonoBehaviour, ITrackableEventHandler { #region PROTECTED_MEMBER_VARIABLES protected TrackableBehaviour mTrackableBehaviour; protected TrackableBehaviour.Status m_PreviousStatus; protected TrackableBehaviour.Status m_NewStatus; #endregion // PROTECTED_MEMBER_VARIABLES #region UNITY_MONOBEHAVIOUR_METHODS protected virtual void Start() { mTrackableBehaviour = GetComponent<TrackableBehaviour>(); if (mTrackableBehaviour) mTrackableBehaviour.RegisterTrackableEventHandler(this); } protected virtual void OnDestroy() { if (mTrackableBehaviour) mTrackableBehaviour.UnregisterTrackableEventHandler(this); } #endregion // UNITY_MONOBEHAVIOUR_METHODS #region PUBLIC_METHODS /// Implementation of the ITrackableEventHandler function /// called when the tracking state changes. public void OnTrackableStateChanged( TrackableBehaviour.Status previousStatus, TrackableBehaviour.Status newStatus) { m_PreviousStatus = previousStatus; m_NewStatus = newStatus; if (newStatus == TrackableBehaviour.Status.DETECTED || newStatus == TrackableBehaviour.Status.TRACKED || newStatus == TrackableBehaviour.Status.EXTENDED_TRACKED) { Debug.Log("Trackable " + mTrackableBehaviour.TrackableName + " found"); OnTrackingFound(); } else if (previousStatus == TrackableBehaviour.Status.TRACKED && newStatus == TrackableBehaviour.Status.NO_POSE) { Debug.Log("Trackable " + mTrackableBehaviour.TrackableName + " lost"); OnTrackingLost(); } else { // For combo of previousStatus=UNKNOWN + newStatus=UNKNOWN|NOT_FOUND // Vuforia is starting, but tracking has not been lost or found yet // Call OnTrackingLost() to hide the augmentations OnTrackingLost(); } } #endregion // PUBLIC_METHODS #region PROTECTED_METHODS protected virtual void OnTrackingFound() { var rendererComponents = GetComponentsInChildren<Renderer>(true); var colliderComponents = GetComponentsInChildren<Collider>(true); var canvasComponents = GetComponentsInChildren<Canvas>(true); // Enable rendering: foreach (var component in rendererComponents) component.enabled = true; // Enable colliders: foreach (var component in colliderComponents) component.enabled = true; // Enable canvas': foreach (var component in canvasComponents) component.enabled = true; } protected virtual void OnTrackingLost() { var rendererComponents = GetComponentsInChildren<Renderer>(true); var colliderComponents = GetComponentsInChildren<Collider>(true); var canvasComponents = GetComponentsInChildren<Canvas>(true); // Disable rendering: foreach (var component in rendererComponents) component.enabled = false; // Disable colliders: foreach (var component in colliderComponents) component.enabled = false; // Disable canvas': foreach (var component in canvasComponents) component.enabled = false; } #endregion // PROTECTED_METHODS } using UnityEngine; public class WaitState : DefaultTrackableEventHandler { public GameObject _InfoCanvas; protected override void Start() { Debug.Log("WaitState.Start()"); } protected override void OnTrackingFound() { base.OnTrackingFound(); Debug.Log("WaitState.OnTrackingFound()"); _InfoCanvas.SetActive(false); } protected override void OnTrackingLost() { base.OnTrackingLost(); Debug.Log("WaitState.OnTrackingLost()"); _InfoCanvas.SetActive(true); } } 的水平数据。我该怎么办?

这是我的输入:

Exp1

Exp1:我想要具有所需的输出结果,如下所示。这里使用的键是数据中的前四个字。其余的值放在一行上。

Exp2

Exp2:与上面相同,但是,这里的值分为两行(使用相同的关键字):

0000000 aa______ 50 F 91
0000000 aa______ 50 F 59
0000000 aa______ 50 F 20
0000000 aa______ 50 F 76
0000001 bb______ 50 F 46
0000001 bb______ 50 F 39
0000001 bb______ 50 F 8
0000001 bb______ 50 F 5
0000003 cc______ 26 F 30
0000003 cc______ 26 F 50
0000003 cc______ 26 F 71
0000003 cc______ 26 F 36
0000004 dd______ 40 M 58
0000004 dd______ 40 M 71
0000004 dd______ 40 M 20
0000004 dd______ 40 M 10

1 个答案:

答案 0 :(得分:1)

如果数据帧中的数据分为几列,这可能会更容易解决。但是,在这种情况下,您可以做的是将字符串分成两部分;一个键,另一个与其他值。可以轻松地在UDF中完成此操作(Spark函数也可以实现,但不清楚):

// put the case class outside main method
case class SplitReturn(key: String, vals: String)

val splitKeyVal = udf((str: String) => {
  val key = str.split(" ").init.mkString(" ")
  val vals = str.split(" ").last
  SplitReturn(key, vals)
})

def groupVals(n: Int) = udf((vals: Seq[String]) => {
  vals.grouped(n).map(_.mkString(" ")).toSeq
})

此处的参数nGroup确定要用作键的单词数。 nGroup=4的使用示例:

val nGroup = 4
val df3 = df.withColumn("ret", splitKeyVal($"value"))
  .withColumn("key", $"ret.key")
  .withColumn("vals", $"ret.vals")
  .groupBy("key").agg(collect_list($"vals").as("val"))
  .withColumn("val", explode(groupVals(nGroup)($"val")))
  .select(concat($"key", lit(" "), concat_ws(" ", $"val")).as("col"))

最后一行将首先连接值,然后添加键以获得单列(此处称为col)。结果:

+--------------------------------+
|col                             |
+--------------------------------+
|0000004 dd______40 M 58 71 20 10|
|0000000 aa______50 F 91 59 20 76|
|0000003 cc______26 F 30 50 71 36|
|0000001 bb______50 F 46 39 8 5  |
+--------------------------------+

设置nGroup=2会给出:

+---------------------------+
|col                        |
+---------------------------+
|0000001 bb______ 50 F 46 39|
|0000001 bb______ 50 F 8 5  |
|0000003 cc______ 26 F 30 50|
|0000003 cc______ 26 F 71 36|
|0000000 aa______ 50 F 91 59|
|0000000 aa______ 50 F 20 76|
|0000004 dd______ 40 M 58 71|
|0000004 dd______ 40 M 20 10|
+---------------------------+