R:如何将字符串替换为整数?

时间:2015-05-18 08:51:23

标签: r replace

我的数据集如下:

classification  Interest    Age     Gender
Card battle     IL029       18-24   male
Card battle     IL001       45-54   male
Card battle     IL001       18-24   male
Card battle     IL001       35-44   male
Card battle     IL001       35-44   male
Card battle     IL013       35-44   male

如何更换" 18-24"到20岁," 35-44"到40和" 45-54"在年龄栏中达到50?

4 个答案:

答案 0 :(得分:5)

尝试这样的事情

with open(filename, 'rU') as handle:
    content = handle.read()

答案 1 :(得分:4)

这会将Age替换为标签为20,40和50的因素:

transform(DF, Age = factor(Age, 
       levels = c("18-24", "35-44", "45-54"),
       labels = c(20, 40, 50)))

,并提供:

  classification Interest Age Gender
1    Card battle    IL029  20   male
2    Card battle    IL001  50   male
3    Card battle    IL001  20   male
4    Card battle    IL001  40   male
5    Card battle    IL001  40   male
6    Card battle    IL013  40   male

实际上它可能会减少到这个,尽管上面的内容更安全一些:

transform(DF, Age = factor(Age, labels = c(20, 40, 50)))

如果您更喜欢整数列,那么:

transform(DF, Age = as.integer(as.character(
       factor(Age, 
         levels = c("18-24", "35-44", "45-54"),
         labels = c(20, 40, 50)
       )
 )))

再次,我们可能会省略levels参数:

transform(DF, Age = as.integer(as.character(factor(Age, labels = c(20, 40, 50)))))

注意:我们将此用作输入:

DF <-
structure(list(classification = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = "Card battle", class = "factor"), Interest = structure(c(3L, 
1L, 1L, 1L, 1L, 2L), .Label = c("IL001", "IL013", "IL029"), class = "factor"), 
    Age = structure(c(1L, 3L, 1L, 2L, 2L, 2L), .Label = c("18-24", 
    "35-44", "45-54"), class = "factor"), Gender = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "male", class = "factor")), .Names = c("classification", 
"Interest", "Age", "Gender"), class = "data.frame", row.names = c(NA, 
-6L))

答案 2 :(得分:2)

data.table解决方案是合并(更容易扩展到更复杂的情况):

library(data.table)
#your data
DT = data.table(
  classification = "Card battle",
  Interest = sprintf('IL%03d', c(29, 1, 1, 1, 1, 13)),
  Age = c("18-24","45-54","18-24", rep("35-44", 3L)),
  Gender = "male"
)

#conversion table
convert = data.table(
  Age_range = c("18-24", "45-54", "35-44"),
  #need to keep as string here since 
  #  the target column to overwrite is character
  Age_middle = paste0(c(20, 40, 50))
)

#replace Age, then set its class
DT[convert, on = c(Age = 'Age_range'), Age := i.Age_middle]
#  now convert back to numeric
DT[ , Age := as.numeric(Age)]

您可以考虑保留范围列,并简单地添加一个舍入的年龄列,这样可以使代码更清晰:

convert = data.table(
  Age_range = c("18-24","45-54","35-44"),
  Age_middle = c(20L,40L,50L)
)

DT[convert, Age_middle := i.Age_middle]
DT
#    classification Interest   Age Gender age_rounded
# 1:    Card battle    IL029 18-24   male          20
# 2:    Card battle    IL001 18-24   male          20
# 3:    Card battle    IL001 35-44   male          50
# 4:    Card battle    IL001 35-44   male          50
# 5:    Card battle    IL013 35-44   male          50
# 6:    Card battle    IL001 45-54   male          40

答案 3 :(得分:0)

另一种方法,使用regex,捕获倒数第二位并在之后放置0:

DF$Age <- as.numeric(sub(".*(\\d)\\d$", "\\10", as.character(DF$Age)))

(如果as.numeric(sub(".*(\\d)\\d$", "\\10", DF$Age))不是因素,则只需Age

DF
#  classification Interest Age Gender
#1    Card battle    IL029  20   male
#2    Card battle    IL001  50   male
#3    Card battle    IL001  20   male
#4    Card battle    IL001  40   male
#5    Card battle    IL001  40   male
#6    Card battle    IL013  40   male