我有一个如下所示的df:
我猜想它将与dplyr和重复项一起使用。但是我不知道如何在分组变量之间进行区分时处理多个列。
from to group
1 2 metro
2 4 metro
3 4 metro
4 5 train
6 1 train
8 7 train
我想找到存在于多个ids
变量中的group
。
样本df
的预期结果是:1
和4
。因为它们存在于地铁和火车组中。
提前谢谢!
答案 0 :(得分:3)
使用基数R,我们可以基于public class FilesFromFolder {
private Workbook writeWorkbook;
public void ExportService(DBConnection con) {
writeWorkbook = new XSSFWorkbook();
Sheet desSheet = writeWorkbook.createSheet("Data");
Statement stmt = null;
ResultSet rs = null;
int columnsNumber = 0;
ResultSetMetaData rsmd = null;
FileOutputStream fileOut = null;
Connection cntn = null;
String filePath = "C:\\Users\\Desktop\\OracleExport";
File files = new File(filePath);
File[] file = files.listFiles();
String fileNameWithOutExt = null;
if (file != null) {
for (int i = 0; i < file.length; i++) {
if (file[i].isFile()) {
String tempFilename = file[i].getName();
fileNameWithOutExt = tempFilename.replaceFirst("[.][^.]+$", "");
File fileTemp = file[i];
try {
String fileContent = FileUtils.readFileToString(fileTemp, "UTF8");
// System.out.println(fileContent);
cntn = con.getConnection();
stmt = cntn.createStatement();
rs = stmt.executeQuery(fileContent);
rsmd = rs.getMetaData();
columnsNumber = rsmd.getColumnCount();
Row desRow1 = desSheet.createRow(0);
for (int col = 0; col < columnsNumber; col++) {
Cell newpath = desRow1.createCell(col);
newpath.setCellValue(rsmd.getColumnLabel(col + 1));
}
while (rs.next()) {
System.out.println("Row number -->" + rs.getRow());
Row desRow = desSheet.createRow(rs.getRow());
for (int col = 0; col < columnsNumber; col++) {
Cell newpath = desRow.createCell(col);
newpath.setCellValue(rs.getString(col + 1));
}
String outputFile = "C:\\Users\\Desktop\\OracleExport\\" + fileNameWithOutExt
+ ".xlsx";
fileOut = new FileOutputStream(outputFile);
writeWorkbook.write(fileOut);
}
System.out.println(fileNameWithOutExt + " export complete");
} catch (IOException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
} finally {
if (fileOut!= null) {
try {
fileOut.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if (cntn != null) {
con.closeConnection();
}
}
}
}
}
}
}
split
的前两列,并使用group
找到各组之间的相交值
intersect
答案 1 :(得分:1)
我们gather
将'from','to'列转换为'long'格式,并按'val'分组,filter
这些组具有多个唯一元素,然后pull
独特的“ val”元素
library(dplyr)
library(tidyr)
df1 %>%
gather(key, val, from:to) %>%
group_by(val) %>%
filter(n_distinct(group) > 1) %>%
distinct(val) %>%
pull(val)
#[1] 1 4
或者使用base R
,我们可以table
来找到频率,并从中获取ID
out <- with(df1, colSums(table(rep(group, 2), unlist(df1[1:2])) > 0)) > 1
names(which(out))
#[1] "1" "4"
df1 <- structure(list(from = c(1L, 2L, 3L, 4L, 6L, 8L), to = c(2L, 4L,
4L, 5L, 1L, 7L), group = c("metro", "metro", "metro", "train",
"train", "train")), class = "data.frame", row.names = c(NA, -6L
))
答案 2 :(得分:1)
使用data.table
将数据转换为长格式并计算唯一值。 melt
用于转换为长格式,数据表允许在i
的{{1}}部分进行过滤,在df1[ i, j, k]
部分进行分组,并k
在pull
部分。
j