Question

我的数据看起来像这样：

    +--------+--------+--------+
| region |  name  | salary |
+--------+--------+--------+
| west   | raj    | 100    |
| north  | simran | 150    |
| region | name   | salary |
| east   | prem   | 250    |
| region | name   | salary |
| south  | preeti | 200    |
+--------+--------+--------+

在第3行和第5行中重复列标题的名称。如何使用R删除第3行和第5行，并保持列标题不变，这样我的输出看起来像这样：

+--------+--------+--------+
| region |  name  | salary |
+--------+--------+--------+
| west   | raj    |    100 |
| north  | simran |    150 |
| east   | prem   |    250 |
| south  | preeti |    200 |
+--------+--------+--------+

假设我的原始数据有太多行，我不想简单地选择行号并使用Data [-c（3，5），]命令删除它们。

Answer 1

这是一个简单的解决方案

x <- data.frame(x =c("a", "b", "c", "x"), z = c("a", "b", "c", "z"))
## identify rows which match colnames 
matched <- apply(x,1, function(i) i[1] %in% colnames(x) && i[2] %in% colnames(x))

## Take the inverse of the match
x[!matched,]

Answer 2

假设salary是一个数字字段，则只需执行此操作-

# assuming df is your dataframe

clean_df <- df[!is.na(as.numeric(df$salary)), ]

删除值与列标题相同的重复行

2 个答案: