我有一个数据集,其中有不同的列。我想根据某些条件将数据分为两个不同的数据集。例如:
x y z m
001 20.19.0 86 16.30.45, 17.55.65, 18.23.21, 19.30.92
001 19.30.92 42 16.30.45, 17.55.65, 18.23.21, 19.30.92
001 22.42.42 52 16.30.45, 17.55.65, 18.23.21, 19.30.92
002 19.30.92 33 16.30.45, 17.55.65, 18.23.21, 19.30.92
002 21.30.22 65 16.30.45, 17.55.65, 18.23.21, 19.30.92
002 18.23.21 56 16.30.45, 17.55.65, 18.23.21, 19.30.92
002 25.63.54 85 16.30.45, 17.55.65, 18.23.21, 19.30.92
我想基于m
中的值进行子集设置,如果m中的值与y中的值相同,我想为此创建另一个数据集。其余的将成为另一个数据集。任何帮助,将不胜感激。谢谢
答案 0 :(得分:0)
如果m
是字符类型,则可以像下面这样使用grepl
:
df1 = subset(df, mapply(grepl, y, m, fixed = TRUE))
df2 = subset(df, !mapply(grepl, y, m, fixed = TRUE))
或
df1 = df[mapply(grepl, df$y, df$m, fixed = TRUE),]
df2 = df[!mapply(grepl, df$y, df$m, fixed = TRUE),]
结果:
> df1
x y z m
2 1 19.30.92 42 16.30.45, 17.55.65, 18.23.21, 19.30.92
4 2 19.30.92 33 16.30.45, 17.55.65, 18.23.21, 19.30.92
6 2 18.23.21 56 16.30.45, 17.55.65, 18.23.21, 19.30.92
> df2
x y z m
1 1 20.19.0 86 16.30.45, 17.55.65, 18.23.21, 19.30.92
3 1 22.42.42 52 16.30.45, 17.55.65, 18.23.21, 19.30.92
5 2 21.30.22 65 16.30.45, 17.55.65, 18.23.21, 19.30.92
7 2 25.63.54 85 16.30.45, 17.55.65, 18.23.21, 19.30.92
数据:
df = structure(list(x = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), y = c("20.19.0",
"19.30.92", "22.42.42", "19.30.92", "21.30.22", "18.23.21", "25.63.54"
), z = c(86L, 42L, 52L, 33L, 65L, 56L, 85L), m = c("16.30.45, 17.55.65, 18.23.21, 19.30.92",
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92",
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92",
"16.30.45, 17.55.65, 18.23.21, 19.30.92", "16.30.45, 17.55.65, 18.23.21, 19.30.92"
)), .Names = c("x", "y", "z", "m"), class = "data.frame", row.names = c(NA,
-7L))