Question

（1）我在R中读取了一个超过10000行和10列的大表。

（2）表格的第3列包含医院的名称。其中一些是重复的甚至更多。

（3）我有一份医院名称清单，例如：其中10个需要进一步研究。

（4）你能不能教我如何用步骤3中列出的名字提取step1中的所有行？

以下是我的输入文件的简短示例;

Patients Treatment Hospital Response 
1        A         YYY      Good 
2        B         YYY      Dead 
3        A         ZZZ      Good 
4        A         WWW      Good 
5        C         UUU      Dead

我有一份我有兴趣进一步研究的医院名单，即YYY和UUU。如何用R？

生成如下的输出表

Patients Treatment Hospital Response 
1        A         YYY      Good 
2        B         YYY      Dead 
5        C         UUU      Dead

Answer 1

使用%in%运算符。

#Sample data
dat <- data.frame(patients = 1:5, treatment = letters[1:5],
  hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))

#List of hospitals we want to do further analysis on
goodHosp <- c("yyy", "uuu")

您可以直接索引到data.frame对象：

dat[dat$hospital %in% goodHosp ,]

或使用subset命令：

subset(dat, hospital %in% goodHosp)

Answer 2

使用@Builder @Data @AllArgsConstructor @NoArgsConstructor @RedisHash("employees") public class Employee { @Id @Indexed private String id; private String firstName; private String lastName; private List<Address> addresses; private List<Department> departments; }

设置数据---使用 @Chase 的示例数据。

Example<Employee> example = Example.of(new Employee(null, "Raj", null, null, null));
long count = employeeRepository.count(example);
System.out.println("COUNT_OF_EMPLOYEE = "+count);

现在使用dplyr #Sample data df <- data.frame(patients = 1:5, treatment = letters[1:5], hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5)) #List of hospitals we want to do further analysis on goodHosp <- c("yyy", "uuu")

过滤数据

dplyr

如何根据外部列表过滤表的行？

2 个答案: