假设我们有一个数据集'people',其中包含ID和Age作为2倍3矩阵。
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.Set;
public class NetworkStuff1 {
private static List <Set<Integer>> nodes ;
private static boolean isIn (int a, int b) {
return nodes.get(a).contains(b) ;
}
private static void addConnection (int a, int b) {
Set <Integer> setB = nodes.get(b) ;
nodes.get(a).addAll (setB) ;
for (int i = 0 ; i < nodes.size() ; ++i) {
if (nodes.get(i) == setB) {
nodes.set(i, nodes.get(a)) ;
}
}
}
public static void solve (Scanner in, OutputStreamWriter out) throws IOException {
int N = Integer.parseInt (in.nextLine()) ;
// Nodes number goes from 1 to N, not 0 to N - 1, but I don't want to deal with this
// so I add a useless 0 cell in my array.
nodes = new ArrayList <Set <Integer>> () ;
for (int i = 0 ; i < N + 1 ; ++i) {
Set <Integer> s = new HashSet <Integer> () ;
s.add(i) ;
nodes.add (s) ;
}
while (true) {
String[] tmp = in.nextLine().split(" ") ;
if (tmp[0].equals("-1")) {
break ;
}
if (tmp[0].equals("C")) {
addConnection(Integer.parseInt (tmp[1]), Integer.parseInt (tmp[2])) ;
}
else if (tmp[0].equals("Q")){
if (isIn (Integer.parseInt (tmp[1]), Integer.parseInt (tmp[2]))) {
System.out.println("Yes") ;
}
else {
System.out.println("No") ;
}
}
}
}
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
solve (new Scanner(System.in), new OutputStreamWriter(System.out)) ;
}
}
在sparkR中,我想创建一个新数据集Id = 1 2 3
Age= 21 18 30
,其中包含所有早于18岁的ID。在这种情况下,它的ID为1和3.在sparkR中我会这样做
people2
但它不起作用。您将如何创建新数据集?
答案 0 :(得分:2)
您可以将SparkR::filter
用于以下任一条件:
> people <- createDataFrame(sqlContext, data.frame(Id=1:3, Age=c(21, 18, 30)))
> filter(people, people$Age > 18) %>% head()
Id Age
1 1 21
2 3 30
或SQL字符串:
> filter(people, "Age > 18") %>% head()
Id Age
1 1 21
2 3 30
还可以在已注册的表上使用SparkR::sql
函数和原始SQL查询:
> registerTempTable(people, "people"
> sql(sqlContext, "SELECT * FROM people WHERE Age > 18") %>% head()
Id Age
1 1 21
2 3 30
答案 1 :(得分:1)
对于那些欣赏R执行任何给定任务的众多选项的人,您还可以使用SparkR :: subset()函数:
res.entity.dataBytes
要回答评论中的其他细节:
> people <- createDataFrame(sqlContext, data.frame(Id=1:3, Age=c(21, 18, 30)))
> people2 <- subset(people, people$Age > 18, select = c(1,2))
> head(people2)
Id Age
1 1 21
2 3 30