我有一个json数据集,格式如下,每行一个条目。
<RelativeLayout
android:layout_height="match_parent"
android:layout_width="match_parent"
android:orientation="vertical">
<FrameLayout
android:id="@+id/home_parent_framelayout"
android:layout_height="wrap_content"
android:layout_width="match_parent"
app:layout_behavior="@string/appbar_scrolling_view_behavior" />
<LinearLayout
android:id="@+id/footer_linearlayout"
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:orientation="horizontal"
android:background="#f1c21e"
android:layout_alignParentBottom="true">
<footer layout>
</LinearLayout>
</RelativeLayout>
我想知道谁卖了最大的芒果等等。 因此,我想将文件加载到dataframe,并为每个事务的数组中的每个产品值发出(key,value)对(product,name)。
{ "sales_person_name" : "John", "products" : ["apple", "mango", "guava"]}
{ "sales_person_name" : "Tom", "products" : ["mango", "orange"]}
{ "sales_person_name" : "John", "products" : ["apple", "banana"]}
{ "sales_person_name" : "Steve", "products" : ["apple", "mango"]}
{ "sales_person_name" : "Tom", "products" : ["mango", "guava"]}
我无法找出爆炸()行(0)的正确方法,并且使用row(1)值发出一次所有值。任何人都可以提出建议。谢谢!
预期产出:
var df = spark.read.json("s3n://sales-data.json")
df.printSchema()
root
|-- sales_person_name: string (nullable = true)
|-- products: array (nullable = true)
var nameProductsMap = df.select("sales_person_name", "products").show()
+-----------------+--------------------+
|sales_person_name| products |
+-----------------+--------------------+
| John|[mango, apple,... |
| Tom|[mango, orange,... |
| John|[apple, banana... |
var resultMap = df.select("products", "sales_person_name")
.map(r => (r(1), r(0)))
.show() //This is where I am stuck.
答案 0 :(得分:5)
val exploded = df.explode("products", "product") { a: mutable.WrappedArray[String] => a }
val result = exploded.drop("products")
result.show()
打印:
+-----------------+-------+
|sales_person_name|product|
+-----------------+-------+
| John| apple|
| John| mango|
| John| guava|
| Tom| mango|
| Tom| orange|
| John| apple|
| John| banana|
| Steve| apple|
| Steve| mango|
| Tom| mango|
| Tom| guava|
+-----------------+-------+
答案 1 :(得分:1)
<强>更新强>
以下代码应该可以使用
var x2 = x0.copy();
x2.domain(["a","b","c","d"]);
var xAxis1 = d3.svg.axis()
.scale(x2)
.tickSize(0)
.orient("bottom");
svg.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + (height+10) + ")")
.call(xAxis1);
结果输出:import org.apache.spark.sql.functions.explode
import scala.collection.mutable
val resultMap = df.select(explode($"products"), $"sales_person_name")
def counter(l: TraversableOnce[Any]) = {
val temp = mutable.Map[Any, Int]()
for (i <- l) {
if(temp.contains(i)) temp(i) += 1
else temp(i) = 1
}
temp
}
resultsMap.map(x => (x(0), Array(x(1)))).
reduceByKey(_ ++ _).
map { case (x,y) => (x, counter(y).toArray) }