我有一个这样的spark DataFrame:
+---+---+---+---+---+---+---+
| f1| f2| f3| f4| f5| f6| f7|
+---+---+---+---+---+---+---+
| 5| 4| 5| 2| 5| 5| 5|
+---+---+---+---+---+---+---+
您怎么可能去
+---+---+
| f1| 5|
+---+---+
| f2| 4|
+---+---+
| f3| 5|
+---+---+
| f4| 2|
+---+---+
| f5| 5|
+---+---+
| f6| 5|
+---+---+
| f7| 5|
+---+---+
spark scala中是否有一个简单的代码可用于换位?
答案 0 :(得分:0)
bins = [0,5,10]
labels = ['{}-{}'.format(i, j) for i, j in zip(bins[:-1], bins[1:])]
b = pd.cut(df['kids_age'], bins=bins, labels=labels, include_lowest=True)
mux = pd.MultiIndex.from_product([df['city'].unique(), labels], names=['city','kids_age'])
df = (df.groupby(['city', b])
.size()
.reindex(mux, fill_value=0)
.reset_index(name='count'))
print (df)
city kids_age count
0 A 0-5 2
1 A 5-10 1
2 B 0-5 1
3 B 5-10 0
4 C 0-5 0
5 C 5-10 1
答案 1 :(得分:0)
spark 2.4+ use map_from_arrays
scala> var df =Seq(( 5, 4, 5, 2, 5, 5, 5)).toDF("f1", "f2", "f3", "f4", "f5", "f6", "f7")
scala> df.select(array('*).as("v"), lit(df.columns).as("k")).select('v.getItem(0).as("cust_id"), map_from_arrays('k,'v).as("map")).select(explode('map)).show(false)
+---+-----+
|key|value|
+---+-----+
|f1 |5 |
|f2 |4 |
|f3 |5 |
|f4 |2 |
|f5 |5 |
|f6 |5 |
|f7 |5 |
+---+-----+
希望它可以帮助您。
答案 2 :(得分:0)
我写了一个函数
object DT {
val KEY_COL_NAME = "dt_key"
val VALUE_COL_NAME = "dt_value"
def pivot(df: DataFrame, valueDataType: DataType, cols: Array[String], keyColName: String, valueColName: String): DataFrame = {
val tempData: RDD[Row] = df.rdd.flatMap(row => row.getValuesMap(cols).map(Row.fromTuple))
val keyStructField = DataTypes.createStructField(keyColName, DataTypes.StringType, false)
val valueStructField = DataTypes.createStructField(valueColName, DataTypes.StringType, true)
val structType = DataTypes.createStructType(Array(keyStructField, valueStructField))
df.sparkSession.createDataFrame(tempData, structType).select(col(keyColName), col(valueColName).cast(valueDataType))
}
def pivot(df: DataFrame, valueDataType: DataType): DataFrame = {
pivot(df, valueDataType, df.columns, KEY_COL_NAME, VALUE_COL_NAME)
}
}
有效
df.show()
DT.pivot(df,DoubleType).show()
喜欢
+---+---+-----------+---+---+ +------+-----------+
| f1| f2| f3| f4| f5| |dt_key| dt_value|
+---+---+-----------+---+---+ to +------+-----------+
|100| 1|0.355072464| 0| 31| | f1| 100.0|
+---+---+-----------+---+---+ | f5| 31.0|
| f3|0.355072464|
| f4| 0.0|
| f2| 1.0|
+------+-----------+
和
+---+---+-----------+-----------+---+ +------+-----------+
| f1| f2| f3| f4| f5| |dt_key| dt_value|
+---+---+-----------+-----------+---+ to +------+-----------+
|100| 1|0.355072464| 0| 31| | f1| 100.0|
| 63| 2|0.622775801|0.685809375| 16| | f5| 31.0|
+---+---+-----------+-----------+---+ | f3|0.355072464|
| f4| 0.0|
| f2| 1.0|
| f1| 63.0|
| f5| 16.0|
| f3|0.622775801|
| f4|0.685809375|
| f2| 2.0|
+------+-----------+
非常好!