我的数据集如下:
classification Interest Age Gender
Card battle IL029 18-24 male
Card battle IL001 45-54 male
Card battle IL001 18-24 male
Card battle IL001 35-44 male
Card battle IL001 35-44 male
Card battle IL013 35-44 male
如何更换" 18-24"到20岁," 35-44"到40和" 45-54"在年龄栏中达到50?
答案 0 :(得分:5)
尝试这样的事情
with open(filename, 'rU') as handle:
content = handle.read()
答案 1 :(得分:4)
这会将Age
替换为标签为20,40和50的因素:
transform(DF, Age = factor(Age,
levels = c("18-24", "35-44", "45-54"),
labels = c(20, 40, 50)))
,并提供:
classification Interest Age Gender
1 Card battle IL029 20 male
2 Card battle IL001 50 male
3 Card battle IL001 20 male
4 Card battle IL001 40 male
5 Card battle IL001 40 male
6 Card battle IL013 40 male
实际上它可能会减少到这个,尽管上面的内容更安全一些:
transform(DF, Age = factor(Age, labels = c(20, 40, 50)))
如果您更喜欢整数列,那么:
transform(DF, Age = as.integer(as.character(
factor(Age,
levels = c("18-24", "35-44", "45-54"),
labels = c(20, 40, 50)
)
)))
再次,我们可能会省略levels
参数:
transform(DF, Age = as.integer(as.character(factor(Age, labels = c(20, 40, 50)))))
注意:我们将此用作输入:
DF <-
structure(list(classification = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = "Card battle", class = "factor"), Interest = structure(c(3L,
1L, 1L, 1L, 1L, 2L), .Label = c("IL001", "IL013", "IL029"), class = "factor"),
Age = structure(c(1L, 3L, 1L, 2L, 2L, 2L), .Label = c("18-24",
"35-44", "45-54"), class = "factor"), Gender = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "male", class = "factor")), .Names = c("classification",
"Interest", "Age", "Gender"), class = "data.frame", row.names = c(NA,
-6L))
答案 2 :(得分:2)
data.table
解决方案是合并(更容易扩展到更复杂的情况):
library(data.table)
#your data
DT = data.table(
classification = "Card battle",
Interest = sprintf('IL%03d', c(29, 1, 1, 1, 1, 13)),
Age = c("18-24","45-54","18-24", rep("35-44", 3L)),
Gender = "male"
)
#conversion table
convert = data.table(
Age_range = c("18-24", "45-54", "35-44"),
#need to keep as string here since
# the target column to overwrite is character
Age_middle = paste0(c(20, 40, 50))
)
#replace Age, then set its class
DT[convert, on = c(Age = 'Age_range'), Age := i.Age_middle]
# now convert back to numeric
DT[ , Age := as.numeric(Age)]
您可以考虑保留范围列,并简单地添加一个舍入的年龄列,这样可以使代码更清晰:
convert = data.table(
Age_range = c("18-24","45-54","35-44"),
Age_middle = c(20L,40L,50L)
)
DT[convert, Age_middle := i.Age_middle]
DT
# classification Interest Age Gender age_rounded
# 1: Card battle IL029 18-24 male 20
# 2: Card battle IL001 18-24 male 20
# 3: Card battle IL001 35-44 male 50
# 4: Card battle IL001 35-44 male 50
# 5: Card battle IL013 35-44 male 50
# 6: Card battle IL001 45-54 male 40
答案 3 :(得分:0)
另一种方法,使用regex
,捕获倒数第二位并在之后放置0:
DF$Age <- as.numeric(sub(".*(\\d)\\d$", "\\10", as.character(DF$Age)))
(如果as.numeric(sub(".*(\\d)\\d$", "\\10", DF$Age))
不是因素,则只需Age
)
DF
# classification Interest Age Gender
#1 Card battle IL029 20 male
#2 Card battle IL001 50 male
#3 Card battle IL001 20 male
#4 Card battle IL001 40 male
#5 Card battle IL001 40 male
#6 Card battle IL013 40 male