Question

我有以下数据框，我称之为臭氧：

   Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9

我想从ozone，Solar.R，Wind中提取最高价值...

此外，如果可能，我如何按降序排序Solar.R或此数据框的任何列

我试过

max(ozone, na.rm=T)

这给了我数据集中的最高值。

我也试过

max(subset(ozone,Ozone))

但获得了"subset" must be logical."

我可以通过以下命令设置一个对象来保存每列的子集

ozone <- subset(ozone, Ozone >0)
max(ozone,na.rm=T)

但它给出了相同的334值，这是数据帧的最大值，而不是列。

任何帮助都会很棒，谢谢。

Answer 1

与colMeans，colSums等类似，您可以编写列最大函数colMax和列排序函数colSort。

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

我在第二个函数中使用...以期引发你的阴谋。

获取您的数据：

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

对样本数据使用colMax函数：

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

要对单个列进行排序，

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

以及所有列都使用我们的colSort函数

colSort(dat, decreasing = TRUE) ## compare with '...' above

Answer 2

获取您想要的任何列的最大值：

max(ozone$Ozone, na.rm = TRUE)

要获得所有列的最大值，您需要：

apply(ozone, 2, function(x) max(x, na.rm = TRUE))

并排序：

ozone[order(ozone$Solar.R),]

或者对另一个方向进行排序：

ozone[rev(order(ozone$Solar.R)),]

Answer 3

这是一个dplyr解决方案：

library(dplyr)

# find max for each column
summarise_each(ozone, funs(max(., na.rm=TRUE)))

# sort by Solar.R, descending
arrange(ozone, desc(Solar.R))

更新： summarise_each()已被弃用，以支持功能更强大的功能系列：mutate_all()，mutate_at()，mutate_if()，{{ 1}}，summarise_all()，summarise_at()

您可以这样做：

summarise_if()

或

# find max for each column
ozone %>%
         summarise_if(is.numeric, funs(max(., na.rm=TRUE)))%>%
         arrange(Ozone)

Answer 4

为了找到每列的最大值，您可以尝试使用apply()函数：

> apply(ozone, MARGIN = 2, function(x) max(x, na.rm=TRUE))
  Ozone Solar.R    Wind    Temp   Month     Day 
   41.0   313.0    20.1    74.0     5.0     9.0

Answer 5

另一种方法是使用？pmax

do.call('pmax', c(as.data.frame(t(ozone)),na.rm=TRUE))
#[1]  41.0 313.0  20.1  74.0   5.0   9.0

Answer 6

假设data.frame中的数据名为maxinozone，您可以执行此操作

max(maxinozone[1, ], na.rm = TRUE)

Answer 7

max(ozone$Ozone, na.rm = TRUE)应该做到这一点。请记住包含na.rm = TRUE，否则R将返回NA。

Answer 8

max(may$Ozone, na.rm = TRUE)

如果没有$Ozone，它将在整个数据框中进行过滤，这可以在漩涡库中学习。

我也在Coursera学习这门课程〜

Answer 9

试试这个解决方案：

Oz<-subset(data, data$Month==5,select=Ozone) # select ozone  value in the month of                 
                                             #May (i.e. Month = 5)
summary(T)                                   #gives caracteristics of table( contains 1 column of Ozone) including max, min ...

Answer 10

有一个包matrixStats提供了一些功能来执行列和行摘要，请参阅包vignette，但您必须将data.frame转换为矩阵。

然后你运行：colMaxs(as.matrix(ozone))

如何在R中的数据框中找到列的最高值？

10 个答案: