Question

我有一个包含多列的数据框。 A列包含重复的数字。 B列包含名字。我想搜索所有行，并且对于列A的相等值，只保留具有'＆amp;'的行符号或列B中的单词'和'。如果没有条目具有这些值中的任何一个，那么我只想保留任何1行与哪一行无关。样本数据：

Column A           Column B     
12345                John
12345                Mary and Bob
12345                Ben
44444                Jim
44444                Larry & Meg
55555                Tommy

预期产出：

Column A            Column B
12345               Mary and Bob
44444               Larry & Meg
55555               Tommy

Answer 1

您可以使用ave和grepl获取匹配的行：

dat[ave(dat$ColumnB, dat$ColumnA, FUN=function(x) {
  g <- grepl("( & )|( and )", x)
  if (all(!g)) {
    seq_along(x) == 1
  } else {
    g
  }
}) == "TRUE",]
#   ColumnA      ColumnB
# 2   12345 Mary and Bob
# 5   44444  Larry & Meg
# 6   55555        Tommy

数据：

dat = data.frame(ColumnA=c(12345, 12345, 12345, 44444, 44444, 55555), ColumnB=c("John", "Mary and Bob", "Ben", "Jim", "Larry & Meg", "Tommy"), stringsAsFactors=FALSE)

Answer 2

尝试

library(data.table)
setDT(df1)[ , {tmp <- grepl('\\band\\b|&', ColumnB)
               .SD[tmp|all(!tmp)]}, ColumnA]
#   ColumnA      ColumnB
#1:   12345 Mary and Bob
#2:   44444  Larry & Meg
#3:   55555        Tommy

或使用dplyr

library(dplyr)
df1 %>% 
   group_by(ColumnA) %>% 
   mutate(tmp= grepl('\\band\\b|&', ColumnB)) %>% 
   filter(tmp|all(!tmp))%>%
   select(-tmp)

#  ColumnA      ColumnB
#1   12345 Mary and Bob
#2   44444  Larry & Meg
#3   55555        Tommy

数据

df1 <- structure(list(ColumnA = c(12345L, 12345L, 12345L, 44444L, 44444L, 
55555L), ColumnB = c("John", "Mary and Bob", "Ben", "Jim", "Larry & Meg", 
"Tommy")), .Names = c("ColumnA", "ColumnB"), class = "data.frame",
row.names = c(NA, -6L))

Answer 3

您希望将数据集拆分为成对和单身，对ID进行重复数据删除，然后返回所有没有情侣的情侣和单身。

 @"\\host\path\filename.any"
 // or 
"\\\\host\\path\\filename.any"

R遍历列并仅保留包含'＆amp;'的行或'和'

3 个答案:

数据