我有一个带有两个字符串变量的数据框,并想使用一个单独的“键”数据框将它们转换为数值。下面的示例已简化,但我需要能够将其应用于基于将始终不为a = 1,b = 2等的任意键替换V1和V2变量的内容。
示例:
set.seed(1)
x <- data.frame(
V1 = sample((letters), 10, replace=TRUE),
V2 = sample((letters), 10, replace=TRUE)
)
key <- data.frame(letters, 1:26)
我需要针对键引用V1的第一个元素,用相应的值替换(例如a = 1,b = 2,依此类推),对第二个元素执行相同的操作,然后在用V1完成时移动并针对V2执行相同操作。
我一直在努力寻找使用lapply()和sub()的解决方案,但一直陷于困境,因为我看不出传递sub()函数的方法可以超过1:1的比较。我应该使用其他功能吗?
请原谅我-我确定解决方案一定很简单,但我对R还是很陌生。
答案 0 :(得分:1)
您可以使用data.table
创建一个查找表,然后使用apply
将映射应用于数据框的列:
library(data.table)
key <- data.table(letters = letters, value = 1:26, key = "letters")
apply(x, 2, function(x) key[x]$value)
>
V1 V2
1 y a
2 d u
3 g u
4 a j
5 b v
6 w n
7 k j
8 n g
9 r i
10 s o
答案 1 :(得分:1)
您可以在基础R中unlist
和match
x[] <- key$values[match(unlist(x), key$letters)]
x
# V1 V2
#1 25 1
#2 4 21
#3 7 21
#4 1 10
#5 2 22
#6 23 14
#7 11 10
#8 14 7
#9 18 9
#10 19 15
或使用dplyr
library(dplyr)
x %>% mutate_all(~key$values[match(., key$letters)])
数据
set.seed(1)
x <- data.frame(
V1 = sample((letters), 10, replace=TRUE),
V2 = sample((letters), 10, replace=TRUE)
)
key <- data.frame(letters = letters, values = 1:26)
答案 2 :(得分:1)
您可以将apply
与行和列的边距一起使用,例如as.data.frame(apply(x, c(1,2), function(l) key[key$letters == l,c(2)]))
。
答案 3 :(得分:1)
使用#database 1
# defining the mean
mu = 0.5
# defining the standard deviation
sigma = 0.1
# The random module uses the seed value as a base
# to generate a random number. If seed value is not
# present, it takes the system’s current time.
np.random.seed(0)
# define the x co-ordinates
X = np.random.normal(mu, sigma, 395)
print(X.shape)
# define the y co-ordinates
Y = np.random.normal(mu * 2, sigma * 3, 395)
print(Y.shape)
##here I get all the errors
df = pd.DataFrame({'X': X, 'Y': Y})
的两种方法可以做到这一点:
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:2627), pid=99975, tid=139586429245184
#
# JRE version: Java(TM) SE Runtime Environment (8.0_71-b15) (build 1.8.0_71-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.71-b15 mixed mode linux-amd64 compressed oops)
/proc/meminfo:
MemTotal: 65949688 kB
MemFree: 5425348 kB
Buffers: 772908 kB
Cached: 39944288 kB
SwapCached: 2516 kB
Active: 43029364 kB
Inactive: 13286372 kB
Active(anon): 15486952 kB
Inactive(anon): 114364 kB
Active(file): 27542412 kB
Inactive(file): 13172008 kB
base R
或
sapply()
x[] <- with(key, sapply(x, function(v) values[match(v,letters)]))
(类似于@Ronak Shah的x <- data.frame(with(key, sapply(x, function(v) values[match(v,letters)])))
方法)as.matrix