子集数据帧基于另一个数据帧的列中的条件

时间:2017-11-18 12:56:25

标签: r

我有两个数据框,每行代表一个人的数据。第一个数据框中的线(输入几何形态测量的特定分析)对应于第二个数据框中的线(动物作为采样点或性别的其他描述)。我想基于来自第二数据帧的条件来对第一数据帧进行子集化(例如,选择第一数据帧的所有行是女性,但动物的性别在第二数据帧中定义)。可以通过向第一个数据框添加新列,基于此新列对其进行子集并删除列来实现。还有其他更优雅的方法吗?

df1
                [,1]       [,2]        [,3]       [,4]       [,5]       [,6]
IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079
IMGP7012.JPG -0.07059829 0.07905529 0.021803102 0.07480276 0.04849282 0.07270644
IMGP7013.JPG -0.07176010 0.08561111 0.009568661 0.08297752 0.06374573 0.08272648
IMGP7014.JPG -0.06751993 0.08895038 0.016800152 0.08799522 0.04776876 0.08100145
IMGP7015.JPG -0.07945826 0.07844136 0.008176800 0.07431915 0.06471417 0.07348312
IMGP7017.JPG -0.06587874 0.09280032 0.010204330 0.09085868 0.05290771 0.08739235

df2
    number site     m m..evis. m..gonads. sex  SL       TL       AP       RP
37     10   KB 1.263    1.003      0.136   F 39.38949 47.72564       NA       NA
38     11   KB 4.215    3.510      0.093   F 53.48064 65.29663       NA       NA
39     12   KB 3.508    2.997      0.079   F 51.59589 64.76600       NA       NA
40     13   KB 3.250    2.752      0.085   F 49.55853 61.74319       NA       NA
41     14   KB 3.596    3.149      0.101   F 51.42303 64.79511       NA       NA
42     10  KKB 3.257    2.451      0.270   M 55.07909 67.52057 1468.017 598.9462
43     11  KKB 3.493    2.275      0.666   M 54.24882 65.61726 1722.414 757.1050
44     12  KKB 3.066    2.210      0.300   M 53.56323 64.09848 1410.891 638.4123
45     13  KKB 3.294    2.193      0.652   M 51.66717 63.49136 1428.063 651.1915
46     14  KKB 2.803    1.871      0.582   M 50.91185 60.90951 1236.438 660.8433
子集

后的

df1

                    [,1]       [,2]        [,3]       [,4]       [,5]       [,6]
IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079

1 个答案:

答案 0 :(得分:2)

df1[df2$sex %in% "F", ]
#                     [,1]       [,2]        [,3]       [,4]       [,5]       [,6]
# IMGP6995.JPG -0.07612235 0.08189661 0.020690012 0.07532420 0.05373111 0.07139840
# IMGP6997.JPG -0.06759482 0.09449720 0.022907275 0.08807724 0.05953926 0.08256468
# IMGP6998.JPG -0.06902234 0.08418980 0.013522385 0.08186618 0.05375763 0.07769076
# IMGP6999.JPG -0.07201136 0.08475765 0.009462017 0.08080315 0.06148776 0.07059229
# IMGP7001.JPG -0.08112908 0.08485488 0.037193459 0.07971364 0.05834018 0.07917079

<强>解释

您的df1看起来像matrix,而不是data.frame。但是,如果df1是数据框,我提供的解决方案也会有效。

df2$sex %in% "F"报告sex是否与F匹配。并使用TRUEFALSE报告逻辑向量。之后,您可以使用它来分组df1

数据

df1 <- matrix(c(-0.07612235, 0.08189661, 0.020690012, 0.07532420, 0.05373111, 0.07139840,
                -0.06759482, 0.09449720, 0.022907275, 0.08807724, 0.05953926, 0.08256468,
                -0.06902234, 0.08418980, 0.013522385, 0.08186618, 0.05375763, 0.07769076,
                -0.07201136, 0.08475765, 0.009462017, 0.08080315, 0.06148776, 0.07059229,
                -0.08112908, 0.08485488, 0.037193459, 0.07971364, 0.05834018, 0.07917079,
                -0.07059829, 0.07905529, 0.021803102, 0.07480276, 0.04849282, 0.07270644,
                -0.07176010, 0.08561111, 0.009568661, 0.08297752, 0.06374573, 0.08272648,
                -0.06751993, 0.08895038, 0.016800152, 0.08799522, 0.04776876, 0.08100145,
                -0.07945826, 0.07844136, 0.008176800, 0.07431915, 0.06471417, 0.07348312,
                -0.06587874, 0.09280032, 0.010204330, 0.09085868, 0.05290771, 0.08739235),
ncol = 6, byrow = TRUE)

rownames(df1) <- c("IMGP6995.JPG", "IMGP6997.JPG", "IMGP6998.JPG", "IMGP6999.JPG",
                   "IMGP7001.JPG", "IMGP7012.JPG", "IMGP7013.JPG", "IMGP7014.JPG",
                   "IMGP7015.JPG", "IMGP7017.JPG")

df2 <- read.table(text = "    number site     m m..evis. m..gonads. sex  SL       TL       AP       RP
37     10   KB 1.263    1.003      0.136   F 39.38949 47.72564       NA       NA
                  38     11   KB 4.215    3.510      0.093   F 53.48064 65.29663       NA       NA
                  39     12   KB 3.508    2.997      0.079   F 51.59589 64.76600       NA       NA
                  40     13   KB 3.250    2.752      0.085   F 49.55853 61.74319       NA       NA
                  41     14   KB 3.596    3.149      0.101   F 51.42303 64.79511       NA       NA
                  42     10  KKB 3.257    2.451      0.270   M 55.07909 67.52057 1468.017 598.9462
                  43     11  KKB 3.493    2.275      0.666   M 54.24882 65.61726 1722.414 757.1050
                  44     12  KKB 3.066    2.210      0.300   M 53.56323 64.09848 1410.891 638.4123
                  45     13  KKB 3.294    2.193      0.652   M 51.66717 63.49136 1428.063 651.1915
                  46     14  KKB 2.803    1.871      0.582   M 50.91185 60.90951 1236.438 660.8433",
                  header = TRUE, stringsAsFactors = FALSE)