我试图找到相关键(但不相同)的值之间的差异。例如,假设我有以下地图:
import android.text.Editable;
import android.text.TextWatcher;
import android.widget.EditText;
import java.text.DecimalFormat;
/**
* Created by srv_twry on 4/12/17.
* Source: https://stackoverflow.com/a/34265406/137744
* The custom TextWatcher that automatically adds thousand separators in EditText.
*/
public class ThousandSeparatorTextWatcher implements TextWatcher {
private DecimalFormat df;
private EditText editText;
private static String thousandSeparator;
private static String decimalMarker;
private int cursorPosition;
public ThousandSeparatorTextWatcher(EditText editText) {
this.editText = editText;
df = new DecimalFormat("#,###.##");
df.setDecimalSeparatorAlwaysShown(true);
thousandSeparator = Character.toString(df.getDecimalFormatSymbols().getGroupingSeparator());
decimalMarker = Character.toString(df.getDecimalFormatSymbols().getDecimalSeparator());
}
@Override
public void beforeTextChanged(CharSequence charSequence, int start, int count, int after) {
cursorPosition = editText.getText().toString().length() - editText.getSelectionStart();
}
@Override
public void onTextChanged(CharSequence charSequence, int i, int i1, int i2) {}
@Override
public void afterTextChanged(Editable s) {
try {
editText.removeTextChangedListener(this);
String value = editText.getText().toString();
if (value != null && !value.equals("")) {
if (value.startsWith(decimalMarker)) {
String text = "0" + decimalMarker;
editText.setText(text);
}
if (value.startsWith("0") && !value.startsWith("0" + decimalMarker)) {
int index = 0;
while (index < value.length() && value.charAt(index) == '0') {
index++;
}
String newValue = Character.toString(value.charAt(0));
if (index != 0) {
newValue = value.charAt(0) + value.substring(index);
}
editText.setText(newValue);
}
String str = editText.getText().toString().replaceAll(thousandSeparator, "");
if (!value.equals("")) {
editText.setText(getDecimalFormattedString(str));
}
editText.setSelection(editText.getText().toString().length());
}
//setting the cursor back to where it was
editText.setSelection(editText.getText().toString().length() - cursorPosition);
editText.addTextChangedListener(this);
} catch (Exception ex) {
ex.printStackTrace();
editText.addTextChangedListener(this);
}
}
private static String getDecimalFormattedString(String value) {
String[] splitValue = value.split("\\.");
String beforeDecimal = value;
String afterDecimal = null;
String finalResult = "";
if (splitValue.length == 2) {
beforeDecimal = splitValue[0];
afterDecimal = splitValue[1];
}
int count = 0;
for (int i = beforeDecimal.length() - 1; i >= 0 ; i--) {
finalResult = beforeDecimal.charAt(i) + finalResult;
count++;
if (count == 3 && i > 0) {
finalResult = thousandSeparator + finalResult;
count = 0;
}
}
if (afterDecimal != null) {
finalResult = finalResult + decimalMarker + afterDecimal;
}
return finalResult;
}
/*
* Returns the string after removing all the thousands separators.
* */
public static String getOriginalString(String string) {
return string.replace(thousandSeparator,"");
}
}
我想将Name_#的内容与Name _(# - 1)进行比较并获得差异。所以,对于上面的例子,我想得到(例如:
(“John_1”,[“a”,”b”,”c”])
(“John_2”,[“a”,”b”])
(“John_3”,[”b”,”c”])
(“Mary_5”,[“a”,”d”])
(“John_5”,[“c”,”d”,”e”])
我正在考虑做某种aggregateByKey然后才找到列表之间的区别,但我不知道如何在我关心的键之间进行匹配,即Name_#with Name _(# - 1)
答案 0 :(得分:0)
拆分&#34; id&#34;:
import org.apache.spark.sql.functions._
val df = Seq(
("John_1", Seq("a","b","c")), ("John_2", Seq("a","b")),
("John_3", Seq("b","c")), ("Mary_5", Seq("a","d")),
("John_5", Seq("c","d","e"))
).toDF("key", "values").withColumn(
"user", split($"key", "_")(0)
).withColumn("id", split($"key", "_")(1).cast("long"))
添加窗口:
val w = org.apache.spark.sql.expressions.Window
.partitionBy($"user").orderBy($"id")
和udf
val diff = udf((x: Seq[String], y: Seq[String]) => y.diff(x)
并计算:
df
.withColumn("is_previous", coalesce($"id" - lag($"id", 1).over(w) === 1, lit(false)))
.withColumn("diff", when($"is_previous", diff( lag($"values", 1).over(w), $"values")).otherwise($"values"))
.show
// +------+---------+----+---+-----------+---------+
// | key| values|user| id|is_previous| diff|
// +------+---------+----+---+-----------+---------+
// |Mary_5| [a, d]|Mary| 5| false| [a, d]|
// |John_1|[a, b, c]|John| 1| false|[a, b, c]|
// |John_2| [a, b]|John| 2| true| []|
// |John_3| [b, c]|John| 3| true| [c]|
// |John_5|[c, d, e]|John| 5| false|[c, d, e]|
// +------+---------+----+---+-----------+---------+
答案 1 :(得分:0)
我设法解决了我的问题如下: 首先创建一个从当前键
计算前一个键的函数def getPrevKey(k: String): String = {
val (n, h) = k.split(“_”)
val i = h.toInt
val sb = new StringBuilder
sb.append(n).append(“_”).append(i-1)
return sb.toString
}
然后,使用移位键创建我的RDD的副本:
val copyRdd = myRdd.map(row => {
val k1 = row._1
val v1 = row._2
val k2 = getPrevHour(k1)
(k2,v1)
})
最后,我将两个RDD联合起来,并通过获取列表之间的差异来减少密钥:
val result = myRdd.union(copyRdd)
.reduceByKey(_.diff(_))
这让我得到了我需要的确切结果,但是由于联合而存在需要大量内存的问题。最终的结果并不是那么大,但部分结果确实压低了整个过程。