我有一个包含不同类型字符串的数据框。我想在其自身内复制字符串,并保持NA值和两位数字符串分别保持NA和两位数。
DF:
Milk Cola Juice Coffee Tea Wine
1 A NA A BD C A
2 AB NA C D CD AD
3 A BC AC D D D
4 AB B NA D CD AD
5 B C AC BD CD NA
6 AB BC C NA NA A
7 NA BC A B NA A
Desired output:
Milk Cola Juice Coffee Tea Wine
1 AA NA AA BD CC AA
2 AB NA CC DD CD AD
3 AA BC AC DD DD DD
4 AB BB NA DD CD AD
5 BB CC AC BD CD NA
6 AB BC CC NA NA AA
7 NA BC AA BB NA AA
谢谢。
答案 0 :(得分:4)
以下是使用正则表达式替换的尝试:
dat[] <- lapply(dat, function(x) sub("^(.)$", paste(rep("\\1",2),collapse=""), x) )
或者以编程方式减少,但结果相同:
dat[] <- lapply(dat, function(x) sub("^(.)$", "\\1\\1", x) )
或者,如果您真的要压缩代码,那么:
dat[] <- lapply(dat, sub, pa="^(.)$", re="\\1\\1")
dat
的位置:
structure(list(Milk = c("A", "AB", "A", "AB", "B", "AB", NA),
Cola = c(NA, NA, "BC", "B", "C", "BC", "BC"), Juice = c("A",
"C", "AC", NA, "AC", "C", "A"), Coffee = c("BD", "D", "D",
"D", "BD", NA, "B"), Tea = c("C", "CD", "D", "CD", "CD",
NA, NA), Wine = c("A", "AD", "D", "AD", NA, "A", "A")), .Names = c("Milk",
"Cola", "Juice", "Coffee", "Tea", "Wine"), row.names = c("1",
"2", "3", "4", "5", "6", "7"), class = "data.frame")
答案 1 :(得分:4)
DF <- " Milk Cola Juice Coffee Tea Wine
1 A NA A BD C A
2 AB NA C D CD AD
3 A BC AC D D D
4 AB B NA D CD AD
5 B C AC BD CD NA
6 AB BC C NA NA A
7 NA BC A B NA A "
DF <- read.table(text=DF, stringsAsFactors=FALSE)
这是DF
:
Milk Cola Juice Coffee Tea Wine
1 A <NA> A BD C A
2 AB <NA> C D CD AD
3 A BC AC D D D
4 AB B <NA> D CD AD
5 B C AC BD CD <NA>
6 AB BC C <NA> <NA> A
7 <NA> BC A B <NA> A
为了实现您的目标,我们可以使用lapply
和ifelse
。
DF[] <- lapply(DF, function(x) ifelse(nchar(x) == 1, paste(x, x, sep=""), x))
对于每一列,如果条目中的数字字符是1,我们将其复制;否则,保持原状。
最终输出:
> DF
Milk Cola Juice Coffee Tea Wine
1 AA <NA> AA BD CC AA
2 AB <NA> CC DD CD AD
3 AA BC AC DD DD DD
4 AB BB <NA> DD CD AD
5 BB CC AC BD CD <NA>
6 AB BC CC <NA> <NA> AA
7 <NA> BC AA BB <NA> AA
答案 2 :(得分:2)
我们也可以使用strrep
执行此操作,C
DF[] <- lapply(DF, function(x) ifelse(nchar(x)==1, strrep(x,2), x))
DF
# Milk Cola Juice Coffee Tea Wine
#1 AA <NA> AA BD CC AA
#2 AB <NA> CC DD CD AD
#3 AA BC AC DD DD DD
#4 AB BB <NA> DD CD AD
#5 BB CC AC BD CD <NA>
#6 AB BC CC <NA> <NA> AA
#7 <NA> BC AA BB <NA> AA
使用dplyr
的选项将是
library(dplyr)
DF %>%
mutate_each(funs(ifelse(nchar(.)==1, strrep(., 2), .)))