基于特定条件的子集数据

时间:2018-07-13 14:21:06

标签: r

我有一个数据集,其中有不同的列。我想根据某些条件将数据分为两个不同的数据集。例如:

 x       y        z                      m
001   20.19.0    86    16.30.45, 17.55.65, 18.23.21, 19.30.92
001   19.30.92   42    16.30.45, 17.55.65, 18.23.21, 19.30.92
001   22.42.42   52    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   19.30.92   33    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   21.30.22   65    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   18.23.21   56    16.30.45, 17.55.65, 18.23.21, 19.30.92
002   25.63.54   85    16.30.45, 17.55.65, 18.23.21, 19.30.92

我想基于m中的值进行子集设置,如果m中的值与y中的值相同,我想为此创建另一个数据集。其余的将成为另一个数据集。任何帮助,将不胜感激。谢谢

1 个答案:

答案 0 :(得分:0)

如果m是字符类型,则可以像下面这样使用grepl

df1 = subset(df, mapply(grepl, y, m, fixed = TRUE))
df2 = subset(df, !mapply(grepl, y, m, fixed = TRUE))

df1 = df[mapply(grepl, df$y, df$m, fixed = TRUE),]
df2 = df[!mapply(grepl, df$y, df$m, fixed = TRUE),]

结果:

> df1
  x        y  z                                      m
2 1 19.30.92 42 16.30.45, 17.55.65, 18.23.21, 19.30.92
4 2 19.30.92 33 16.30.45, 17.55.65, 18.23.21, 19.30.92
6 2 18.23.21 56 16.30.45, 17.55.65, 18.23.21, 19.30.92

> df2
  x        y  z                                      m
1 1  20.19.0 86 16.30.45, 17.55.65, 18.23.21, 19.30.92
3 1 22.42.42 52 16.30.45, 17.55.65, 18.23.21, 19.30.92
5 2 21.30.22 65 16.30.45, 17.55.65, 18.23.21, 19.30.92
7 2 25.63.54 85 16.30.45, 17.55.65, 18.23.21, 19.30.92

数据:

df = structure(list(x = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), y = c("20.19.0", 
"19.30.92", "22.42.42", "19.30.92", "21.30.22", "18.23.21", "25.63.54"
), z = c(86L, 42L, 52L, 33L, 65L, 56L, 85L), m = c("16.30.45, 17.55.65, 18.23.21, 19.30.92", 
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92", 
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92", 
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92"
)), .Names = c("x", "y", "z", "m"), class = "data.frame", row.names = c(NA, 
-7L))