(我有一种感觉,在得到答案之后我会感到很愚蠢,但我无法理解这一点。)
我有一个data.frame,末尾有一个空列。它将主要填充NA,但我想用一个值填充它的一些行。此列表示对data.frame中其中一列中缺少的数据的猜测。
我的初始data.frame看起来像这样:
Game | Rating | MinPlayers | MaxPlayers | MaxPlayersGuess
---------------------------------------------------------
A | 6 | 3 | 6 |
B | 7 | 3 | 7 |
C | 6.5 | 3 | N/A |median(df$MaxPlayers[df$MinPlayers ==3,])
D | 7 | 3 | 6 |
E | 7 | 3 | 5 |
F | 9.5 | 2 | 5 |
G | 6 | 2 | 4 |
H | 7 | 2 | 4 |
I | 6.5 | 2 | N/A |median(df$MaxPlayers[df$MinPlayers ==2,])
J | 7 | 2 | 2 |
K | 7 | 2 | 4 |
请注意,其中两行对于MaxPlayers具有“N / A”。我试图做的是使用我所掌握的信息来猜测MaxPlayers可能是什么。如果3个玩家游戏的中位数(MaxPlayers)为6,则对于MinPlayers == 3和MaxPlayers == N / A的游戏,MaxPlayerGuess应该等于6。 (我试图在代码中指出MaxPlayerGuess在上面的例子中应该得到什么值。)
生成的data.frame看起来像这样:
Game | Rating | MinPlayers | MaxPlayers | MaxPlayersGuess
---------------------------------------------------------
A | 6 | 3 | 6 |
B | 7 | 3 | 7 |
C | 6.5 | 3 | N/A |6
D | 7 | 3 | 6 |
E | 7 | 3 | 5 |
F | 9.5 | 2 | 5 |
G | 6 | 2 | 4 |
H | 7 | 2 | 4 |
I | 6.5 | 2 | N/A |4
J | 7 | 2 | 2 |
K | 7 | 2 | 4 |
分享一次尝试的结果:
gld$MaxPlayersGuess <- ifelse(is.na(gld$MaxPlayers), median(gld$MaxPlayers[gld$MinPlayers,]), NA)
Error in gld$MaxPlayers[gld$MinPlayers, ] :
incorrect number of dimensions
答案 0 :(得分:2)
相对于发布的示例进行更新。
这是我今天的提示,有时候您可以更轻松地计算出您想要的内容,然后在您需要时抓住它而不是使用所有这些逻辑上的相关信息。您试图想出一种方法来同时计算所有内容并使其混乱,将其分解为步骤。您需要知道&#34; MaxPlayer&#34;的中值。对于每个可能的&#34; MinPlayer&#34;组。然后,您希望在缺少MaxPlayer时使用该值。所以这是一个简单的方法。
#generate fake data
MinPlayer <- rep(3:2, each = 4)
MaxPlayer <- rep(2:5, each = 2, times = 2)
df <- data.frame(MinPlayer, MaxPlayer)
#replace some values of MaxPlayer with NA
df$MaxPlayer <- ifelse(df$MaxPlayer == 3, NA, df$MaxPlayer)
####STARTING DATA
# > df
# MinPlayer MaxPlayer
# 1 3 2
# 2 3 2
# 3 3 NA
# 4 3 NA
# 5 2 4
# 6 2 4
# 7 2 5
# 8 2 5
# 9 3 2
# 10 3 2
# 11 3 NA
# 12 3 NA
# 13 2 4
# 14 2 4
# 15 2 5
# 16 2 5
####STEP 1
#find the median of MaxPlayer for each group of MinPlayer (e.g., when MinPlayer == 1, 2 or whatever)
#just add a column to the data frame that has the right median value for each subset of MinPlayer in it and grab that value to use later.
library(plyr) #plyr is a great way to compute things across data subsets
df <- ddply(df, c("MinPlayer"), transform,
median.minp = median(MaxPlayer, na.rm = TRUE)) #ignore NAs in the median
####STEP 2
#anytime that MaxPlayer == NA, grab the median value to replace the NA, otherwise keep the MaxPlayer value
df$MaxPlayer <- ifelse(is.na(df$MaxPlayer), df$median.minp, df$MaxPlayer)
####STEP 3
#you had to compute an extra column you don't really want, so drop it now that you're done with it
df <- df[ , !(names(df) %in% "median.minp")]
####RESULT
# > df
# MinPlayer MaxPlayer
# 1 2 4
# 2 2 4
# 3 2 5
# 4 2 5
# 5 2 4
# 6 2 4
# 7 2 5
# 8 2 5
# 9 3 2
# 10 3 2
# 11 3 2
# 12 3 2
# 13 3 2
# 14 3 2
# 15 3 2
# 16 3 2
下面的旧答案......
请发布可重复的示例!!
#fake data
this <- rep(1:2, each = 1, times = 2)
that <- rep(3:2, each = 1, times = 2)
df <- data.frame(this, that)
如果您只是询问基本索引....例如,找到符合条件的值,这将返回符合条件的值的行索引(查找?):
> which(df$this < df$that)
[1] 1 3
这将返回与您的条件匹配的值的值,而不是行索引 - 您只需使用&#34返回的行索引;&#34;在数据框的正确列中找到相应的值(此处为#34;此&#34;)
> df[which(df$this < df$that), "this"]
[1] 1 1
如果你想在&#34;这个&#34;是&#34;少&#34;除此之外,在数据框中添加一个新列,只需使用&#34; ifelse&#34;。如果else创建一个逻辑向量,其中的东西符合您的条件,然后对符合您条件的事物进行处理(例如,您的逻辑测试== TRUE)。
#if "this" is < "that", multiply by 2
df$result <- ifelse(df$this < df$that, df$this * 2, NA)
> df
this that result
1 1 3 2
2 2 2 NA
3 1 3 2
4 2 2 NA
如果没有可重复的示例,则无法提供更多示例。
答案 1 :(得分:1)
我认为你已经在@ griffmer的答案中拥有了所需的一切。但是不太优雅但可能更直观的方式可能是一个循环:
## Your data:
df <- data.frame(
Game = LETTERS[1:11],
Rating = c(6,7,6.5,7,7,9.5,6,7,6.5,7,7),
MinPlayers = c(rep(3,5), rep(2,6)),
MaxPlayers = c(6,7,NA,6,5,5,4,4,NA,2,4)
)
## Loop over rows:
df$MaxPlayersGuess <- vapply(1:nrow(df), function(ii){
if (is.na(df$MaxPlayers[ii])){
median(df$MaxPlayers[df$MinPlayers == df$MinPlayers[ii]],
na.rm = TRUE)
} else {
df$MaxPlayers[ii]
}
}, numeric(1))
给你
df
# Game Rating MinPlayers MaxPlayers MaxPlayersGuess
# 1 A 6.0 3 6 6
# 2 B 7.0 3 7 7
# 3 C 6.5 3 NA 6
# 4 D 7.0 3 6 6
# 5 E 7.0 3 5 5
# 6 F 9.5 2 5 5
# 7 G 6.0 2 4 4
# 8 H 7.0 2 4 4
# 9 I 6.5 2 NA 4
# 10 J 7.0 2 2 2
# 11 K 7.0 2 4 4
答案 2 :(得分:1)
如果您想使用/Users/John/perl5/lib
,可以尝试:
输入:
PERL5LIB
过程:
Q: Why didn't ASan report an obviously invalid memory access in my code?
A1: If your errors is too obvious, compiler might have already optimized it
out by the time Asan runs.
A2: Another, C-only option is accesses to global common symbols which are
not protected by Asan (you can use -fno-common to disable generation of
common symbols and hopefully detect more bugs).
这会将数据基础dplyr
分组,然后将df <- data.frame(
Game = LETTERS[1:11],
Rating = c(6,7,6.5,7,7,9.5,6,7,6.5,7,7),
MinPlayers = c(rep(3,5), rep(2,6)),
MaxPlayers = c(6,7,NA,6,5,5,4,4,NA,2,4)
)
的中间值分配给缺少数据的行。
输出:
df %>%
group_by(MinPlayers) %>%
mutate(MaxPlayers = if_else(is.na(MaxPlayers), median(MaxPlayers, na.rm=TRUE), MaxPlayers))