用字符串R替换数据框中的所有数字实例

时间:2014-06-21 23:39:56

标签: r replace dataframe

我正在考虑用词/字符串替换数据框中的所有数字。每个数字将替换为完全相同的单词。例如数字5的所有实例应替换为' banana',数字10的所有实例用' kiwi'等等。

以下是一个示例数据框。 Rownames和colnames也是数字:

#    1  2  3  4  5  6
#1   7  7  7  7  7  7
#2   5  5  5  5  5  5
#3   4  4  4  4  4  4
#4   8  8  8  8  8  8
#5   1  1  1  1  1  1
#6   2  2  2  2  2  2
#7   6  6  6  6  3  3
#8   3  3  3  3  6  6
#9  10 10 10 10 10 10
#10 11 11 11 11 11 11
#11 12 12 12 12 12 12
#12  9  9  9  9  9  9

以下是用于复制此内容的示例数据(mydf):

mydf<-structure(c(7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 
1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 
9, 7, 5, 4, 8, 1, 2, 6, 3, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 
6, 10, 11, 12, 9, 7, 5, 4, 8, 1, 2, 3, 6, 10, 11, 12, 9), .Dim = c(12L, 
6L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12"), c("1", "2", "3", "4", "5", "6")))

这是我构建的数据框(mydata),显示哪个数字应替换为哪个单词/水果:

mydata <- data.frame(nums = c(1:12))                     
mydata$fruits<-c("apple", "pear", "orange", "melon", "banana", "grape", "pineapple",      "mango", "lemon", "kiwi", "guava", "peach")

我尝试查看类似命名的线程,但他们主要讨论更改数据帧的某些部分(例如特定变量或特定观察),而不是整个数据帧的内容。

我尝试使用多个gsub命令,但由于多种原因,这并不起作用。我想我需要使用一个函数来应用df中的所有变量,但不确定是什么。

最终结果应如下所示:

      1           2           3           4           5           6          
1  "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple"
2  "banana"    "banana"    "banana"    "banana"    "banana"    "banana"   
3  "melon"     "melon"     "melon"     "melon"     "melon"     "melon"    
4  "mango"     "mango"     "mango"     "mango"     "mango"     "mango"    
5  "apple"     "apple"     "apple"     "apple"     "apple"     "apple"    
6  "pear"      "pear"      "pear"      "pear"      "pear"      "pear"     
7  "grape"     "grape"     "grape"     "grape"     "orange"    "orange"   
8  "orange"    "orange"    "orange"    "orange"    "grape"     "grape"    
9  "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"     
10 "guava"     "guava"     "guava"     "guava"     "guava"     "guava"    
11 "peach"     "peach"     "peach"     "peach"     "peach"     "peach"    
12 "lemon"     "lemon"     "lemon"     "lemon"     "lemon"     "lemon"

虽然理想情况下,引号不会显示(但我不确定这是否可行)。

4 个答案:

答案 0 :(得分:4)

您可以使用match执行此操作,mydata引用查找向量(您的mydf[] <- mydata$fruits[match(mydf, mydata$nums)] ),返回另一个向量的每个元素的向量中的位置。

data.frame

如果您强制使用as.data.frame(mydf) # 1 2 3 4 5 6 # 1 pineapple pineapple pineapple pineapple pineapple pineapple # 2 banana banana banana banana banana banana # 3 melon melon melon melon melon melon # 4 mango mango mango mango mango mango # 5 apple apple apple apple apple apple # 6 pear pear pear pear pear pear # 7 grape grape grape grape orange orange # 8 orange orange orange orange grape grape # 9 kiwi kiwi kiwi kiwi kiwi kiwi # 10 guava guava guava guava guava guava # 11 peach peach peach peach peach peach # 12 lemon lemon lemon lemon lemon lemon ,则在将对象打印到屏幕时,引号不可见:

data.frame

无论您是否强迫quote=FALSE,都可以向write.tablewrite.csv提供{{1}},以防止导出文件中字符串周围出现引号。

答案 1 :(得分:0)

replace可能会对你有用。

> replace(mydf, seq_along(mydf), mydata[[2]][mydf])
#    1           2           3           4           5           6          
# 1  "pineapple" "pineapple" "pineapple" "pineapple" "pineapple" "pineapple"
# 2  "banana"    "banana"    "banana"    "banana"    "banana"    "banana"   
# 3  "melon"     "melon"     "melon"     "melon"     "melon"     "melon"    
# 4  "mango"     "mango"     "mango"     "mango"     "mango"     "mango"    
# 5  "apple"     "apple"     "apple"     "apple"     "apple"     "apple"    
# 6  "pear"      "pear"      "pear"      "pear"      "pear"      "pear"     
# 7  "grape"     "grape"     "grape"     "grape"     "orange"    "orange"   
# 8  "orange"    "orange"    "orange"    "orange"    "grape"     "grape"    
# 9  "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"      "kiwi"     
# 10 "guava"     "guava"     "guava"     "guava"     "guava"     "guava"    
# 11 "peach"     "peach"     "peach"     "peach"     "peach"     "peach"    
# 12 "lemon"     "lemon"     "lemon"     "lemon"     "lemon"     "lemon"   

如果需要,它可以用as.data.frame包裹以删除引号。

答案 2 :(得分:0)

由于水果的顺序正确且被1:12编入索引,您可以使用mydf的条目来编入mydata$fruits

apply(mydf, 2, function(x) mydata$fruits[x])

如果值的顺序不正确,或者未覆盖所有可能的值(有&#34;孔&#34;),您可以使用因子进行翻译:

apply(mydf, 2, function(x) factor(x, levels=mydata$nums, labels=mydata$fruits))

答案 3 :(得分:0)

另一种可能的方法:

library(qdapTools)
as.data.frame(apply(mydf, 2, lookup, mydata))

##            1         2         3         4         5         6
## 1  pineapple pineapple pineapple pineapple pineapple pineapple
## 2     banana    banana    banana    banana    banana    banana
## 3      melon     melon     melon     melon     melon     melon
## 4      mango     mango     mango     mango     mango     mango
## 5      apple     apple     apple     apple     apple     apple
## 6       pear      pear      pear      pear      pear      pear
## 7      grape     grape     grape     grape    orange    orange
## 8     orange    orange    orange    orange     grape     grape
## 9       kiwi      kiwi      kiwi      kiwi      kiwi      kiwi
## 10     guava     guava     guava     guava     guava     guava
## 11     peach     peach     peach     peach     peach     peach
## 12     lemon     lemon     lemon     lemon     lemon     lemon