Question

我有一个这样的数据框：

df.show（）

var old1;
var old2;
var old3;

function checkForChange() {
  if (
    old1 != $('#in1').val() ||
    old2 != $('#in2').val() ||
    old3 != $('#in3').val()
  ) {
    $('#save').show();
  }
}
$(function() {
  old1 = $('#in1').val();
  old2 = $('#in2').val();
  old3 = $('#in3').val();
});

如何将其转换为如下所示的数据框

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

<form>
  <input id="in1" onchange="checkForChange()" onkeydown="checkForChange()" onkeyup="checkForChange()" value="hello world" /><br/>
  <input id="in2" onchange="checkForChange()" onkeydown="checkForChange()" onkeyup="checkForChange()" value="hello cat" /><br/>
  <input id="in3" onchange="checkForChange()" onkeydown="checkForChange()" onkeyup="checkForChange()" value="hello dog" /><br/>
  <input id="save" value="save" type="submit" style="display:none;" />
</form>

Answer 1

这取决于您的“列表”的类型：

如果类型为ArrayType()：

df = spark.createDataFrame(spark.sparkContext.parallelize([['a', ["a","b","c"]], ['b', ["d","e","f"]]]), ["key", "col"])
df.printSchema()
df.show()
root
 |-- key: string (nullable = true)
 |-- col: array (nullable = true)
 |    |-- element: string (containsNull = true)
+---+---------+
|key|      col|
+---+---------+
|  a|[a, b, c]|
|  b|[d, e, f]|
+---+---------+

您可以使用[]来访问值，就像使用python一样：

df.select("key", df.col[0], df.col[1], df.col[2]).show()
+---+------+------+------+
|key|col[0]|col[1]|col[2]|
+---+------+------+------+
|  a|     a|     b|     c|
|  b|     d|     e|     f|
+---+------+------+------+

如果它的类型为StructType() ：（也许您通过读取JSON来构建数据框）

df2 = df.select("key", F.struct(
        df.col[0].alias("col1"), 
        df.col[1].alias("col2"), 
        df.col[2].alias("col3")
    ).alias("col"))
df2.printSchema()
df2.show()

root
 |-- key: string (nullable = true)
 |-- col: struct (nullable = false)
 |    |-- col1: string (nullable = true)
 |    |-- col2: string (nullable = true)
 |    |-- col3: string (nullable = true)
+---+---------+
|key|      col|
+---+---------+
|  a|[a, b, c]|
|  b|[d, e, f]|
+---+---------+

您可以使用*直接“拆分”该列：

df2.select('key', 'col.*').show()

+---+----+----+----+
|key|col1|col2|col3|
+---+----+----+----+
|  a|   a|   b|   c|
|  b|   d|   e|   f|
+---+----+----+----+

将DataFrame中的字符串数组拆分为自己的列

1 个答案: