Question

我的数据集如下：

classification  Interest    Age     Gender
Card battle     IL029       18-24   male
Card battle     IL001       45-54   male
Card battle     IL001       18-24   male
Card battle     IL001       35-44   male
Card battle     IL001       35-44   male
Card battle     IL013       35-44   male

如何更换＆＃34; 18-24＆＃34;到20岁，＆＃34; 35-44＆＃34;到40和＆＃34; 45-54＆＃34;在年龄栏中达到50？

Answer 1

尝试这样的事情

with open(filename, 'rU') as handle:
    content = handle.read()

Answer 2

这会将Age替换为标签为20,40和50的因素：

transform(DF, Age = factor(Age, 
       levels = c("18-24", "35-44", "45-54"),
       labels = c(20, 40, 50)))

，并提供：

  classification Interest Age Gender
1    Card battle    IL029  20   male
2    Card battle    IL001  50   male
3    Card battle    IL001  20   male
4    Card battle    IL001  40   male
5    Card battle    IL001  40   male
6    Card battle    IL013  40   male

实际上它可能会减少到这个，尽管上面的内容更安全一些：

transform(DF, Age = factor(Age, labels = c(20, 40, 50)))

如果您更喜欢整数列，那么：

transform(DF, Age = as.integer(as.character(
       factor(Age, 
         levels = c("18-24", "35-44", "45-54"),
         labels = c(20, 40, 50)
       )
 )))

再次，我们可能会省略levels参数：

transform(DF, Age = as.integer(as.character(factor(Age, labels = c(20, 40, 50)))))

注意：我们将此用作输入：

DF <-
structure(list(classification = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = "Card battle", class = "factor"), Interest = structure(c(3L, 
1L, 1L, 1L, 1L, 2L), .Label = c("IL001", "IL013", "IL029"), class = "factor"), 
    Age = structure(c(1L, 3L, 1L, 2L, 2L, 2L), .Label = c("18-24", 
    "35-44", "45-54"), class = "factor"), Gender = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "male", class = "factor")), .Names = c("classification", 
"Interest", "Age", "Gender"), class = "data.frame", row.names = c(NA, 
-6L))

Answer 3

data.table解决方案是合并（更容易扩展到更复杂的情况）：

library(data.table)
#your data
DT = data.table(
  classification = "Card battle",
  Interest = sprintf('IL%03d', c(29, 1, 1, 1, 1, 13)),
  Age = c("18-24","45-54","18-24", rep("35-44", 3L)),
  Gender = "male"
)

#conversion table
convert = data.table(
  Age_range = c("18-24", "45-54", "35-44"),
  #need to keep as string here since 
  #  the target column to overwrite is character
  Age_middle = paste0(c(20, 40, 50))
)

#replace Age, then set its class
DT[convert, on = c(Age = 'Age_range'), Age := i.Age_middle]
#  now convert back to numeric
DT[ , Age := as.numeric(Age)]

您可以考虑保留范围列，并简单地添加一个舍入的年龄列，这样可以使代码更清晰：

convert = data.table(
  Age_range = c("18-24","45-54","35-44"),
  Age_middle = c(20L,40L,50L)
)

DT[convert, Age_middle := i.Age_middle]
DT
#    classification Interest   Age Gender age_rounded
# 1:    Card battle    IL029 18-24   male          20
# 2:    Card battle    IL001 18-24   male          20
# 3:    Card battle    IL001 35-44   male          50
# 4:    Card battle    IL001 35-44   male          50
# 5:    Card battle    IL013 35-44   male          50
# 6:    Card battle    IL001 45-54   male          40

Answer 4

另一种方法，使用regex，捕获倒数第二位并在之后放置0：

DF$Age <- as.numeric(sub(".*(\\d)\\d$", "\\10", as.character(DF$Age)))

（如果as.numeric(sub(".*(\\d)\\d$", "\\10", DF$Age))不是因素，则只需Age）

DF
#  classification Interest Age Gender
#1    Card battle    IL029  20   male
#2    Card battle    IL001  50   male
#3    Card battle    IL001  20   male
#4    Card battle    IL001  40   male
#5    Card battle    IL001  40   male
#6    Card battle    IL013  40   male

R：如何将字符串替换为整数？

4 个答案: