Question

我正在尝试删除数据框中的所有行（个案），其中某个列的值与另一个列值不匹配。

数据框bilat_total包含以下10列/变量：

bilat_total[,c("year", "importer1", "importer2", "flow1", 
                              "flow2", "country", "imports", "exports", "bi_tot", 
                              "mother")]

因此桌子的头部是：

year   importer1       importer2  flow1  flow2     country
6  2009 Afghanistan          Bhutan     NA     NA Afghanistan
11 2009 Afghanistan Solomon Islands     NA     NA Afghanistan
12 2009 Afghanistan           India 516.13 120.70 Afghanistan
13 2009 Afghanistan           Japan 124.21   0.46 Afghanistan
15 2009 Afghanistan        Maldives     NA     NA Afghanistan
19 2009 Afghanistan      Bangladesh   4.56   1.09 Afghanistan

   imports exports       bi_tot         mother
6  6689.35  448.25           NA United Kingdom
11 6689.35  448.25           NA United Kingdom
12 6689.35  448.25 1.804361e-02 United Kingdom
13 6689.35  448.25 6.876602e-05 United Kingdom
15 6689.35  448.25           NA United Kingdom
19 6689.35  448.25 1.629456e-04 United Kingdom

我尝试通过制作子集来删除importer2与mother不匹配的所有案例：

subset(bilat_total, importer2 == mother)

但每次我这样做，都会收到错误：

Ops.factor(importer2, mother)中的错误：级别因素集不同

如何删除importer 2和mother不匹配的所有行/案例？

Answer 1

错误可能是因为列是factor类。我们可以将列转换为character类，然后与subset行进行比较。

subset(bilat_total, as.character(importer2) == as.character(mother))

基于数据示例显示

subset(bilat_total, importer2 == mother)
# Error in Ops.factor(importer2, mother) : 
#  level sets of factors are different

删除列值与另一列不匹配的某些行（全部在同一数据框内）

1 个答案: