R:根据匹配一个或多个可能的字符串

时间:2016-06-27 22:14:02

标签: regex r dataframe pattern-matching match

假设数据框:

strings      new column
mesh         1
foo          0
bar          0
tack         1
suture       1

如果df $ strings包含字符串“mesh”,“tack”或“sutur”,我希望新列包含“1”。否则它应该在同一行显示零。我尝试了以下方法:

df$new_column <- ifelse(grepl("mesh" | "tack" | "sutur",
  df$strings, ignore.case = T), "1", "0")

但得到了这个错误:

Error in "mesh" | "tack" : 
  operations are possible only for numeric, logical or complex types

提前致谢。

2 个答案:

答案 0 :(得分:4)

您想在$rawInput = @" 123450,ADN,,2785,"1,576,000,000.06",TEXT TEST TEXT,, 999999,NSU,,1234,"-1,576,000,000.06",TEXT TEST TEXT TEST,, 790834,CHI,,5678,"2,345,000,000.01","TEXT TEST (TEXT), TEST",, 893472,JAP,,0123,"-2,345,000,000.01","TILL THERE (ALMOST), UH",, 093289,CRU,,6489,"424,000,000.00",TEST TEXT UB,, "@ $items = convertfrom-csv $rawInput -Header 'num1','alpha1','blank1','num2','num3string','text','blank2','blank3' $items | Foreach { # Remove unwanted characters from num3string field and pad left with zeroes # Note this also discards the `-` character, which was not specified, but is # necessary to get the desired output. $numString = ($_.num3string -replace '\.|\,|\-','').PadLeft(18, '0') # Strip unwanted chars from the text field as well $textField = $_.text -replace '\.|\,|\"','' # Build the string using .NET-style format strings. # To leading-pad the numbers, use {N:DX} which means # format argument N as a decimal int X chars wide '{0:D14}{1}{2:D9},{3}{4}' -f ( [int]$_.num1, $_.alpha1, [int]$_.num2, $numString, $textField ) } | Out-File 'out.txt' # Send output to a file 中使用单个字符串:

grep

会起作用,但以下内容会更快:

df$new_column <- ifelse(grepl("mesh|tack|sutur", df$strings, ignore.case = T),
                       "1", "0")

这将返回0和1整数向量

答案 1 :(得分:3)

我们也可以使用%in%

df$new_column <-  as.integer(df$strings %in% c("mesh", "tack", "sutur"))