我编写了一些代码来查看数据框并在一列中按最小值创建一个新的数据框,并按因子分割。但是,我无法弄清楚如何在结果中包含另一列(不使用它作为因素)。
例如,以下代码创建一个数据框,其中包含" State"," Institution"," Rating1"," Rating2&#34 ;和" Junkdata"。然后,它会为列出的每个状态找到最小值1,并为每个状态创建一个每个最小值的数据框。但是说我想包括" Institution"柱。我怎么做? 我也搞乱了一些plyr解决方案,但没有骰子。
以下是我使用的代码,但未指定" Institution"专栏,足以说我已经尝试过把它放在那里我能想到的每一个都没有成功。
##create the data frame
State <- c("AZ","AZ","AZ","CA","CA","CA","CA","CA","NY","NY","NY","NY","SD","SD")
Institution <- c("Institution 1","Institution 2","Institution 3","Institution 4","Institution 5","Institution 6","Institution 7","Institution 8","Institution 9","Institution 10","Institution 11","Institution 12","Institution 13","Institution 14")
Rating1 <- c(3.4, 5.6,2.2,6.3,8.3,2.1,3.3,9.7,7.7,5.4,9.9,3.2,6.1,5.2)
Rating2 <- c(8.4,3.4,6.5,2.5,7.5,4.2,5.6,8.3,4.9,3.3,1.1,8,7.7,3.3)
Junkdata <- c("junk","more junk","superfluous","junk","more junk","superfluous","junk","more junk","superfluous","junk","more junk","superfluous","junk","more junk")
data.df <- data.frame(State, Institution, Rating1, Rating2, Junkdata)
## Use aggregate to find the minimum of Rating 1 for each State
new.df <- aggregate(data.df$Rating1 ~ State, data.df, min)
使用plyr:
library(plyr)
new.df.2 <- ddply(new.df, .(State), summarise, min=min(data.df$Rating1))
答案 0 :(得分:1)
通过dplyr很容易做到:
library(dplyr)
resultDF <- data.df %>%
group_by(State) %>%
filter(Rating1 == min(Rating1))
诀窍是按州分组。然后,当您应用过滤器操作时,您只查看每个状态,并选择其中Rating1是该状态内所有Rating1值的最小值的行。
回应评论中的问题 如果您只想包含State,Institution和Rating1,您可以执行以下链:
resultDF <- data.df %>%
group_by(State) %>%
filter(Rating1 == min(Rating1)) %>%
select(State, Institution, Rating1)