我希望加载和处理包含七个变量的CSV文件,一个是分组变量/因子(data$hashtag
),另外六个是类别(data$support
和其他)使用" X"或" x" (或留空)。
data <- read.csv("maet_coded_tweets.csv", stringsAsFactors = F)
names(data) <- c("hashtag", "support", "contributeConversation", "otherCommunities", "buildCommunity", "engageConversation", "unclear")
str(data)
'data.frame': 854 obs. of 7 variables:
$ hashtag : chr "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
$ support : chr "x" "x" "x" "x" ...
$ contributeConversation: chr "" "" "" "" ...
$ otherCommunities : chr "" "" "" "" ...
$ buildCommunity : chr "" "" "" "" ...
$ engageConversation : chr "" "" "" "" ...
$ unclear : chr "" "" "" "" ...
当我使用函数重新编码&#34; X&#34;或&#34; x&#34;到1,&#34;&#34; (空白)0,数据是奇怪的字符类型,不是预期的数字。
recode <- function(x) {
x[x=="x"] <- 1
x[x=="X"] <- 1
x[x==""] <- 0
x
}
data[] <- lapply(data, recode)
str(data)
'data.frame': 854 obs. of 7 variables:
$ hashtag : chr "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
$ support : chr "1" "1" "1" "1" ...
$ contributeConversation: chr "0" "0" "0" "0" ...
$ otherCommunities : chr "0" "0" "0" "0" ...
$ buildCommunity : chr "0" "0" "0" "0" ...
$ engageConversation : chr "0" "0" "0" "0" ...
$ unclear : chr "0" "0" "0" "0" ...
当我尝试在函数中使用as.numeric()
强制使用字符时,它仍然无法正常工作。给出了什么 - 为什么将变量视为字符以及如何将变量设置为数字?
答案 0 :(得分:2)
怎么样:
recode <- function(x) {
ifelse(x %in% c('X','x'), 1,0)
}
说明:函数中的步骤按顺序计算,而不是同时计算。因此,当您将1&1 39分配给一个字符向量时,它们会被转换为&#34; 1&#34; s。
答案 1 :(得分:1)
这是什么意思?
# sample data with support being a character vector
data.frame(support = c("X","X","0","x","0"),a=1:5,stringsAsFactors = F)->myDat
# convert to a factor and check the order of the levels
myDat$support <- as.factor(myDat$support)
levels(myDat$support)
#"0" "x" "X"
# just to see that it worked make an additional variable
myDat$supportrecoded <- myDat$support
# change levels and convert
levels(myDat$supportrecoded) <- c("0","1","1")
myDat$supportrecoded <- as.integer(as.character(myDat$supportrecoded ))
答案 2 :(得分:1)
使用mapvalues
中的plyr
。
data$support <- as.numeric(mapvalues(data$support, c("X", "x", ""), c(1, 1, 0)))
使用replace
。
data$support <- replace(x <- data$support, x == "X", 1)
data$support <- replace(x <- data$support, x == "x", 1)
data$support <- replace(x <- data$support, x == "", 0)
data$support <- numeric(data$support)