R在数据帧中添加字符串

时间:2016-08-11 05:31:24

标签: r dataframe

我有一个包含不同类型字符串的数据框。我想在其自身内复制字符串,并保持NA值和两位数字符串分别保持NA和两位数。

 DF:
    Milk     Cola   Juice   Coffee  Tea Wine
1   A        NA     A       BD     C    A
2   AB       NA     C       D      CD   AD
3   A        BC     AC      D      D    D
4   AB       B      NA      D      CD   AD
5   B        C      AC      BD     CD   NA
6   AB       BC     C       NA     NA   A
7   NA       BC     A       B      NA   A 

 Desired output:
    Milk     Cola   Juice   Coffee  Tea Wine
1   AA       NA     AA      BD     CC   AA
2   AB       NA     CC      DD     CD   AD
3   AA       BC     AC      DD     DD   DD
4   AB       BB     NA      DD     CD   AD
5   BB       CC     AC      BD     CD   NA
6   AB       BC     CC      NA     NA   AA
7   NA       BC     AA      BB     NA   AA

谢谢。

3 个答案:

答案 0 :(得分:4)

以下是使用正则表达式替换的尝试:

dat[] <- lapply(dat, function(x) sub("^(.)$", paste(rep("\\1",2),collapse=""), x) )

或者以编程方式减少,但结果相同:

dat[] <- lapply(dat, function(x) sub("^(.)$", "\\1\\1", x) )

或者,如果您真的要压缩代码,那么:

dat[] <- lapply(dat, sub, pa="^(.)$", re="\\1\\1")

dat的位置:

structure(list(Milk = c("A", "AB", "A", "AB", "B", "AB", NA), 
    Cola = c(NA, NA, "BC", "B", "C", "BC", "BC"), Juice = c("A", 
    "C", "AC", NA, "AC", "C", "A"), Coffee = c("BD", "D", "D", 
    "D", "BD", NA, "B"), Tea = c("C", "CD", "D", "CD", "CD", 
    NA, NA), Wine = c("A", "AD", "D", "AD", NA, "A", "A")), .Names = c("Milk", 
"Cola", "Juice", "Coffee", "Tea", "Wine"), row.names = c("1", 
"2", "3", "4", "5", "6", "7"), class = "data.frame")

答案 1 :(得分:4)

DF <- "    Milk     Cola   Juice   Coffee  Tea Wine
1   A        NA     A       BD     C    A
2   AB       NA     C       D      CD   AD
3   A        BC     AC      D      D    D
4   AB       B      NA      D      CD   AD
5   B        C      AC      BD     CD   NA
6   AB       BC     C       NA     NA   A
7   NA       BC     A       B      NA   A "
DF <- read.table(text=DF, stringsAsFactors=FALSE)

这是DF

  Milk Cola Juice Coffee  Tea Wine
1    A <NA>     A     BD    C    A
2   AB <NA>     C      D   CD   AD
3    A   BC    AC      D    D    D
4   AB    B  <NA>      D   CD   AD
5    B    C    AC     BD   CD <NA>
6   AB   BC     C   <NA> <NA>    A
7 <NA>   BC     A      B <NA>    A

为了实现您的目标,我们可以使用lapplyifelse

DF[] <- lapply(DF, function(x) ifelse(nchar(x) == 1, paste(x, x, sep=""), x))

对于每一列,如果条目中的数字字符是1,我们将其复制;否则,保持原状。

最终输出:

> DF
  Milk Cola Juice Coffee  Tea Wine
1   AA <NA>    AA     BD   CC   AA
2   AB <NA>    CC     DD   CD   AD
3   AA   BC    AC     DD   DD   DD
4   AB   BB  <NA>     DD   CD   AD
5   BB   CC    AC     BD   CD <NA>
6   AB   BC    CC   <NA> <NA>   AA
7 <NA>   BC    AA     BB <NA>   AA

答案 2 :(得分:2)

我们也可以使用strrep执行此操作,C

DF[] <- lapply(DF, function(x) ifelse(nchar(x)==1, strrep(x,2), x))
DF
#  Milk Cola Juice Coffee  Tea Wine
#1   AA <NA>    AA     BD   CC   AA
#2   AB <NA>    CC     DD   CD   AD
#3   AA   BC    AC     DD   DD   DD
#4   AB   BB  <NA>     DD   CD   AD
#5   BB   CC    AC     BD   CD <NA>
#6   AB   BC    CC   <NA> <NA>   AA
#7 <NA>   BC    AA     BB <NA>   AA

使用dplyr的选项将是

library(dplyr)
DF %>%
   mutate_each(funs(ifelse(nchar(.)==1, strrep(., 2), .)))