Question

好的，我有一个类似于这个结构的csv文件

hashID,value,flag

98fafd,   35,   1

fh56w2,   25,   0

ggjeas,   55,   1

adfh5d,   45,   0

基本上我想要做的是获取值列的中位数，但只包括计算中flag==1的行。

这在R中是否可能？我四处寻找并没有找到这样的东西。

Answer 1

你也可以用一个布尔数组的快速单行程序来做这个数据框的索引：

# read the data from a csv file
newdata <- read.csv("file.csv")
# this will give you a vector of boolean values of length nrow(newdata)
newdata$flag==1
# and this line uses the above vector to retrieve only those elements of 
# newdata$value for which the row contains a flag value of 1
median(newdata$value[newdata$flag==1])

Answer 2

这是一种可能性：

使用以下命令读取数据集：

newdata <- read.csv("stackoverflow questions/mediancol.csv")
# I assume you have the data in csv format

   # Showing the data I used for the computation
     newdata <- structure(list(hashID = structure(c(1L, 3L, 4L, 2L), .Label = c("98fafd", 
"adfh5d", "fh56w2", "ggjeas"), class = "factor"), value = c(35L, 
25L, 55L, 45L), flag = c(1L, 0L, 1L, 0L)), .Names = c("hashID", 
"value", "flag"), class = "data.frame", row.names = c(NA, -4L
))
    > newdata
  hashID value flag
1 98fafd    35    1
2 fh56w2    25    0
3 ggjeas    55    1
4 adfh5d    45    0

# Subset the data when flag =1
newdata1 <- subset(newdata,flag==1)

# Look at the summary of the data

> summary(newdata1)
    hashID      value         flag  
 98fafd:1   Min.   :35   Min.   :1  
 adfh5d:0   1st Qu.:40   1st Qu.:1  
 fh56w2:0   Median :45   Median :1  
 ggjeas:1   Mean   :45   Mean   :1  
            3rd Qu.:50   3rd Qu.:1  
            Max.   :55   Max.   :1

# Only look at the median 
median(newdata1$value)
[1] 45

获取列的中位数，其中R中另一列的值为1

2 个答案: