Question

我有一个数据框，其列名如下所示：

d=c("Q.40a-some Text", "Q.40b-some Text", "Q.44a-some Text", "Q.44b-some Text" "Q.44c-some Text" "Q.44d-some Text" ,"Q.4a-some Text", "Q.4b-some Text")

我想确定以Q.4开头的列并忽略Q.40，Q.44。

例如，识别Q.44或Q.40很容易。我所做的是使用此"^Q.44"或“^ Q.40”作为我的函数的输入。但是，如果我为确定Q.4做同样的事情，这是行不通的 - 因为所有名字都以Q.4开头。那么，有人可以帮助我吗？

更新

我希望将结果传递给我的函数，该函数接受如下输入：

multichoice<-function(data, question.prefix){

  index<-grep(question.prefix, names(data))    # identifies the index for the available options in Q.12
  cases<-length(index)                # The number of possible options / columns 

  # Identify the range of possible answers for each question 
  # Step 1. Search for the min in each col and across each col choose the min
  # step 2. Search for the max in each col and across each col choose the max 

  mn<-min(data[,index[1:cases]], na.rm=T)
  mx<-max(data[,index[1:cases]], na.rm=T)
  d = colSums(data[, index] != 0, na.rm = TRUE)  # The number of elements across column vector, that are different from zero. 

  vec<-matrix(,nrow=length(mn:mx),ncol=cases)

  for(j in 1:cases){
    for(i in mn:mx){
      vec[i,j]=sum(data[, index[j]] == i, na.rm = TRUE)/d[j]  # This stores the relative responses for option j for the answer that is i
    }
  }

  vec1<-as.data.frame(vec)
  names(vec1)<-names(data[index])
  vec1<-t(vec1)
  return(vec1)
}

我使用我的功能的方式就是这个

q4 <-multichoice(df2,"^Q.4")

在“^ Q.4”的位置，我打算识别Q.4的列，而df2是我的数据帧。

Answer 1

我们可以使用stringr，

library(stringr)
str_extract(d, 'Q.[0-9]+') == 'Q.4'
#[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

#or 

d[str_extract(d, 'Q.[0-9]+') == 'Q.4']
#[1] "Q.4a-some Text" "Q.4b-some Text"

如果格式始终相同（即Q. [0-9] ...）那么我们可以使用gsub

gsub('\\D', '', d) == 4
#[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE

Answer 2

以下是使用grep的方法：返回索引

grep("^Q\\.4[^0-9]", d)

列名：

grep("^Q\\.4[^0-9]", d, value=T)

这是有效的，因为[^ 0-9]表示任何不是数字的字符，所以我们按字面顺序匹配Q.4，然后匹配任何非数字的字符串。

我相信你在函数的mn语句中想要的是

mn <- min(sapply(data[,index], min, na.rm=T), na.rm=T)

sapply移动选定的索引grep所选的列，并找到min的最小值。然后，min将应用于所有列。

搜索字符串以忽略多个匹配

2 个答案: