为简单起见,我们假设我有以下daraframe:
col X col Y col Z
A 1 5
A 2 10
A 3 10
B 5 15
我想要Groupby X列并通过取最小值Z来聚合但是我希望Y值是最小值Z的相邻值
df.groupBy("X").agg(min("Z"), take_y_according_to_min_z("Y")
期望的输出:
col X col Y col Z
A 1 5
B 5 15
注意:如果有两个以上的min("Z")
值,我不关心我们采用哪一行。
我试图找到一些干净的SPARKy在线的东西。我很清楚如何在MapReduce中做到这一点,但我无法在SPARK上找到方法。
我正在研究SPARK 1.6
答案 0 :(得分:2)
你可以简单地做
selectednode = (DefaultMutableTreeNode) TreePro.getLastSelectedPathComponent();
DefaultMutableTreeNode node = (DefaultMutableTreeNode) selectednode.getParent();
if(selectednode != null){
if (selectednode.isLeaf()) {
Iterator<Map.Entry<Integer, String>> irt = col.entrySet().iterator();
while(irt.hasNext())
{
Map.Entry<Integer, String> entry = irt.next();
if(selectednode.isLeaf() && entry.getValue().equals(TextField2.getText()))
{
System.out.println(" Removed. "+entry.getKey());
irt.remove(); // Call Iterator's remove method.
node.remove(selectednode);
System.out.println("LinkedHashMap Size : "+col.size());
model.reload(node);
}
}
}
// The problem begin from here
else{
int p = JOptionPane.showConfirmDialog(null, "Warning "+selectednode+ " is a Parent node, It will DEL all his child nodes" , "Delete",JOptionPane.YES_NO_OPTION);
if(p == 0){
for (int i = 0; i < selectednode.getChildCount(); i++) {
TreeNode nodee = selectednode.getChildAt(i);
String batie = nodee.toString();
System.out.println("batreeee "+batie);
System.out.println("break time "+selectednode.getChildAt(i));
Iterator<Map.Entry<Integer, String>> itt = col.entrySet().iterator();
while(itt.hasNext())
{
Map.Entry<Integer, String> entryy = itt.next();
if( entryy.getValue().equals(TextField2.getText()))
{
System.out.println(" Removed. "+entryy.getKey());
itt.remove(); // Call Iterator's remove method.
node.remove(selectednode);
System.out.println("LinkedHashMap Size : "+col.size());
model.reload(node);
}
if( entryy.getValue().equals(batie))
{
System.out.println(" Removed. "+entryy.getKey());
itt.remove(); // Call Iterator's remove method.
System.out.println("LinkedHashMap Size : "+col.size());
model.reload(node);
}
}
}selectednode.removeAllChildren();
}}}}
你会得到你想要的东西
Job1 EXEC +10 03:28 (03:23) #J18911
Job2 EXEC +10 12:56 (01:55) #J1766
Job3 EXEC +10 04/05 #J333460
Job4 EXEC +10 02/26 (01:10) #J3322
Job5 EXEC +10 04:58 (02:23) #J189115; <04/18
Job6 EXEC +10 16:07 (00:23) #J189115; &0:05
Job7 EXEC +10 14:00 (01:02) #J260721; <04/18
答案 1 :(得分:1)
您可以使用struct
作为列Y
和Z
作为
df.groupBy("X").agg(min(struct("Z", "Y")).as("min"))
.select("X", "min.*")
输出:
+---+---+---+
|X |Z |Y |
+---+---+---+
|B |15 |5 |
|A |5 |1 |
+---+---+---+
希望这有帮助1