我是R的新手并试图找出如何更改数据框的现有值。我在下面列出了一个快速概述,感谢您的帮助!!
以下是我目前所拥有的内容:
company top_ten_advertiser
'Company A' 'Is Top 10'
'Company B' 'Is Top 10'
'Company C' 'Is Top 10'
'Company D' 'Is Top 10'
… …
'Company X' 'Not Top 10'
'Company Y' 'Not Top 10'
'Company Z' 'Not Top 10'
我想更改为以下内容:
company top_ten_advertiser
'Company A' 'Top 10 Company'
'Company B' 'Top 10 Company'
'Company C' 'Top 10 Company'
'Company D' 'Top 10 Company'
… …
'Company X' 'Not Top 10 Company'
'Company Y' 'Not Top 10 Company'
'Company Z' 'Not Top 10 Company'
答案 0 :(得分:1)
假设你的数据框叫做df。对字符变量执行此操作:
# Add the word "Company" to all values of top_ten_advertiser
df$top_ten_advertiser = paste(df$top_ten_advertiser, "Company", sep=" ")
# Remove the "Is " from "Is Top 10"
df$top_ten_advertiser = gsub("Is ", "", df$top_ten_advertiser)
或者这是一个因子变量:
# Install the plyr package if you haven't already done so using
# install.packages("plyr")
library(plyr)
revalue(df$top_ten_advertiser,
c("Is Top 10"="Top 10 Company", "Not Top 10"="Not Top 10 Company"))
如果您发现更改因子的级别很痛苦,可以先将因子变量转换为字符,更改值,然后转换回因子,如下所示:
df$top_ten_advertiser = as.character(df$top_ten_advertiser)
df$top_ten_advertiser = paste(df$top_ten_advertiser, "Company", sep=" ")
df$top_ten_advertiser = gsub("Is ", "", df$top_ten_advertiser)
df$top_ten_advertiser = factor(df$top_ten_advertiser)
而且,为了完整起见,马修伦德伯格提到用正则表达式来做这件事:
tst$top_ten_advertiser = gsub("(Is )?(.*)", "\\2 Company", tst$top_ten_advertiser)
它是简约的,但如果你是正则表达式的新手,那就很神秘。这将适用于字符或因子变量。但是,对因子变量执行此操作会将其转换为字符。
答案 1 :(得分:0)
df$top_ten_advertiser <- gsub("Is Top 10", "Top 10 Company", df$top_ten_advertiser)