Question

我有一个带有两个字符串变量的数据框，并想使用一个单独的“键”数据框将它们转换为数值。下面的示例已简化，但我需要能够将其应用于基于将始终不为a = 1，b = 2等的任意键替换V1和V2变量的内容。

示例：

set.seed(1)
x <- data.frame(
    V1 = sample((letters), 10, replace=TRUE),
    V2 = sample((letters), 10, replace=TRUE)
)
key <- data.frame(letters, 1:26)

我需要针对键引用V1的第一个元素，用相应的值替换（例如a = 1，b = 2，依此类推），对第二个元素执行相同的操作，然后在用V1完成时移动并针对V2执行相同操作。

我一直在努力寻找使用lapply（）和sub（）的解决方案，但一直陷于困境，因为我看不出传递sub（）函数的方法可以超过1：1的比较。我应该使用其他功能吗？

请原谅我-我确定解决方案一定很简单，但我对R还是很陌生。

Answer 1

您可以使用data.table创建一个查找表，然后使用apply将映射应用于数据框的列：

library(data.table)

key <- data.table(letters = letters, value = 1:26, key = "letters")
apply(x, 2, function(x) key[x]$value)

>
   V1 V2
1   y  a
2   d  u
3   g  u
4   a  j
5   b  v
6   w  n
7   k  j
8   n  g
9   r  i
10  s  o

Answer 2

您可以在基础R中unlist和match

x[] <- key$values[match(unlist(x), key$letters)]
x

#   V1 V2
#1  25  1
#2   4 21
#3   7 21
#4   1 10
#5   2 22
#6  23 14
#7  11 10
#8  14  7
#9  18  9
#10 19 15

或使用dplyr

library(dplyr)
x %>%  mutate_all(~key$values[match(., key$letters)])

数据

set.seed(1)
x <- data.frame(
    V1 = sample((letters), 10, replace=TRUE),
    V2 = sample((letters), 10, replace=TRUE)
)
key <- data.frame(letters = letters, values = 1:26)

Answer 3

您可以将apply与行和列的边距一起使用，例如as.data.frame(apply(x, c(1,2), function(l) key[key$letters == l,c(2)]))。

Answer 4

使用#database 1 # defining the mean mu = 0.5 # defining the standard deviation sigma = 0.1 # The random module uses the seed value as a base # to generate a random number. If seed value is not # present, it takes the system’s current time. np.random.seed(0) # define the x co-ordinates X = np.random.normal(mu, sigma, 395) print(X.shape) # define the y co-ordinates Y = np.random.normal(mu * 2, sigma * 3, 395) print(Y.shape) ##here I get all the errors df = pd.DataFrame({'X': X, 'Y': Y})的两种方法可以做到这一点：

使用# # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (os_linux.cpp:2627), pid=99975, tid=139586429245184 # # JRE version: Java(TM) SE Runtime Environment (8.0_71-b15) (build 1.8.0_71-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.71-b15 mixed mode linux-amd64 compressed oops) /proc/meminfo: MemTotal: 65949688 kB MemFree: 5425348 kB Buffers: 772908 kB Cached: 39944288 kB SwapCached: 2516 kB Active: 43029364 kB Inactive: 13286372 kB Active(anon): 15486952 kB Inactive(anon): 114364 kB Active(file): 27542412 kB Inactive(file): 13172008 kB

base R

或

sapply()

使用x[] <- with(key, sapply(x, function(v) values[match(v,letters)]))（类似于@Ronak Shah的x <- data.frame(with(key, sapply(x, function(v) values[match(v,letters)])))方法）

as.matrix

如何使用第二个数据帧作为键在一个数据帧中“翻译”变量？

4 个答案: