我有这样的数据。
0.20+0.50 = 0.70
0.90+0.10 = 1.0
我试图将每2行最后一列的值加起来这样。
1:23:0.20:0.70
2:34:0.50:0.70
3:67:0.90:1.0
4:87:0.10:1.0
5:23:0.12
并像这样打印
data = LOAD '/home/user/Documents/test/test.txt' using PigStorage(':') AS (tag:int,rssi:chararray,weightage:chararray,seqnum:int);
B = FOREACH (GROUP data ALL) {
A_ordered = ORDER data BY rssi;
GENERATE FLATTEN(CUSTOM_UDF(A_ordered));
}
这是我的猪脚本
this is what I tried.
public List<String> sumValues() {
List<String> processedList = new ArrayList<>();
if (entries == null) {
return processedList;
} else {
double columnSum = 0;
List<String> tempList = new ArrayList<>();
int length = entries.size();
for (int index = 1; index <= length; index++) {
tempList.add(entries.get(index - 1));
String[] splitValues = entries.get(index - 1).split(DELIMITER);
if (splitValues.length >= MIN_SPLIT_STRING_LENGTH) {
try {
double lastValue = Double.parseDouble(splitValues[WEIGHTAGE_INDEX]);
columnSum = columnSum + lastValue;
if ((index % ROWS_TO_BE_SUMMED == 0) || (index == length)) {
for (String tempString : tempList) {
processedList.add(tempString + ":" + columnSum);
}
tempList.clear(); // Clear the temporary array
columnSum = 0;
}
} catch (NumberFormatException e) {
System.out.println("Invalid weightage");
}
} else {
System.out.println("Invalid input");
}
}
}
return processedList;
}
@Override
public String exec(Tuple input) throws IOException {
System.out.println("------INSIDE EXEC FUCTION ----" + input);
if (input != null && input.size() != 0) {
try {
String str = (String) input.get(0);
if (str != null) {
String splitStrings[] = str.split(":");
if (splitStrings != null && splitStrings.length >= 3 && splitStrings[2].equals(EXIT)) {
List<String> processedList = sumValues();
String sum = processedList.toString();
System.out.println("SUM VALUE----:" + sum);
return sum;
} else {
System.out.println("INPUT VALUE----:" + str);
entries.add(str);
return null;
}
}
} catch (Exception e) {
return null;
}
}
return null;
}
}
我尝试使用java UDF。但不能正常工作。
this.state = {docs: []}
this.db = this.props.db
componentDidMount () {
this.updateDocs()
this.db.changes({
since: 'now',
live: true
}).on('change', (change) => {
this.updateDocs()
}).on('error', (err) => {
console.error(err)
})
}
updateDocs () {
this.db.allDocs({include_docs: true}).then((res) => {
var docs = res.rows.map((row) => row.doc)
this.setState({docs})
})
}
上面的代码打印出空结果。 任何帮助将不胜感激。
答案 0 :(得分:2)
这可以在PIG中完成。生成另一列,根据数据集中的偶数行说f11并从中减去1以创建具有相同id的2行的集合。这将允许您将这两个记录分组到新的列并将最后一列相加。然后使用关系连接新集合并获取所需的列。
注意:对于n行求和,请使用f1%n_value。
A = LOAD 'input.txt' USING PigStorage(':') AS (f1:int,f2:int,f3:double);
B = FOREACH A GENERATE f1,(f1%2 == 0 ? (f1-1):f1) AS f11,f2,f3;
C = GROUP B BY f11;
D = FOREACH C GENERATE group AS f11,SUM(f3) AS Total;
E = JOIN B BY f11,D BY f11;
F = FOREACH E GENERATE B.f1,B.f2,B.f3,D.Total;-- Note:use B::f1,B::f2,B::f3,D::Total if '.' doesn't work.
<强>输出强>
<强> A 强>
1,23,0.20
2,34,0.50
3,67,0.90
4,87,0.10
5,23,0.12
B - 根据偶数行添加新的第二列 - 1。
1,1,23,0.20
2,1,34,0.50
3,3,67,0.90
4,3,87,0.10
5,5,23,0.12
C - 按新的第二列分组
1,{(1,23,0.20),(2,34,0.50)}
3,{(3,67,0.90),(4,87,0.10)}
5,{(5,23,0.12)}
D - 在分组后生成总和
1,0.70
3,1.0
5,0.12
E - 使用新列
加入上一步中使用B的数据集1,1,23,0.20,1,0.70
2,1,34,0.50,1,0.70
3,3,67,0.90,3,1.0
4,3,87,0.10,3,1.0
5,5,23,0.12,5,0.12
E - 获取所需的列。
1,23,0.20,0.70
2,34,0.50,0.70
3,67,0.90,1.0
4,87,0.10,1.0
5,23,0.12,0.12
答案 1 :(得分:0)
在您的udf中,您收到tuple(int, chararray, chararray, int)
并尝试获取第一个元素String
。当您使用try...catch
包围代码时,您看不到明确出现的ClassCastException
。因为你已经加载了它,所以你不需要将值除以:
。