我无法找到这个具体问题的答案。我想将多个字符列重新编码为数字列。 (这是一百列)但是:
所以,我认为我不能使用一系列列索引。但是,我希望重新编码的列以相同的列名前缀开头。我想将任何“是”重新编码为1,将“否”重新编码为0,并将空白重新编码为NA。
我可以使用以下代码一次手动执行此操作:
#Recode columns one at a time
library(car)
#skip ID column
#Skip Date column
df$Q1<-as.numeric(as.character(recode(df$Q1,"NA=NA; 'No'=0; 'Yes'=1; ''=NA")))
df$Q2<-as.numeric(as.character(recode(df$Q2,"NA=NA; 'No'=0; 'Yes'=1; ''=NA")))
#skip Q2.Explanation column
#do the above for a hundred more columns...
但我想同时重新编写一百个特定列。这些列也是由我不想重新编码的列分隔的。
我的数据如下。不知道什么是dput:
ID<-c(01,02,03,04,05)
Q1<-c("Yes", NA,"", "No",NA)
Q1.Explanation<-c (NA, NA,"","Respondent did not get the correct answer", NA)
Q2<-c("No","Yes","Yes","", NA)
Q2.Explanation <-c("The right answer was not proven", NA, NA, NA, NA)
Q3<-c("", NA, "Yes", NA, NA)
Mydata<-as.data.frame(cbind(ID,Q1,Q1.Explanation, Q2, Q2.Explanation,Q3))
答案 0 :(得分:2)
如果您知道要更改的列始终具有相同的名称,只是表中的不同位置,那么您可以使用列名称上的正则表达式进行子集化,然后使用{{1}更改列中的值}。
apply()
这应该重新编码以&#34; Q&#34;开头的所有列。无论他们在任何一个月的位置。
答案 1 :(得分:1)
对于data.table
粉丝,我有另一个解决方案,它还具有使用factors
代替数字整数进行重新编码的优势,以便
数值的含义仍然正确显示(提高数据的可读性):
library(data.table)
ID<-c(01,02,03,04,05)
Q1<-c("Yes", NA,"", "No",NA)
Q1.Explanation<-c (NA, NA,"","Respondent did not get the correct answer", NA)
Q2<-c("No","Yes","Yes","", NA)
Q2.Explanation <-c("The right answer was not proven", NA, NA, NA, NA)
Q3<-c("", NA, "Yes", NA, NA)
Mydata<-as.data.frame(cbind(ID,Q1,Q1.Explanation, Q2, Q2.Explanation,Q3))
Mydata
# The solution starts here... ----------------------------------------------
setDT(Mydata) # convert data.frame into data.table
# the regular expression selects all column names starting with a "Q" followed by digits until the end
affected.cols <- colnames(Mydata)[grep("^Q\\d+$", colnames(Mydata))]
# convert the columns to factors; trailing square brackets are only added to print the output
Mydata[, (affected.cols) := lapply(affected.cols, function(x) { .SD[, factor(get(x), c("No", "Yes")) ] })] []
str(Mydata) # Columns are encoded as factors ("enumerated types") now, which is an integer internally that has a string label
# Proof: 1 = "No", 2 = "Yes"; the "excluded" parameter of "factor()" caused all other values (mainly empty strings) to be translated into NAs
as.numeric(Mydata$Q1)
结果是:
> as.numeric(Mydata$Q1)
[1] 2 NA NA 1 NA
> Mydata
ID Q1 Q1.Explanation Q2 Q2.Explanation Q3
1: 1 Yes NA No The right answer was not proven NA
2: 2 NA NA Yes NA NA
3: 3 NA Yes NA Yes
4: 4 No Respondent did not get the correct answer NA NA NA
5: 5 NA NA NA NA NA
正确转换为数值是因为幸运的情况是请求的数值以1开头,因此“No”的级别索引为1,“是”级别索引为2。