在数据帧上使用子集

时间:2017-11-19 21:51:22

标签: r dataframe subset

我正在研究一个问题,需要我对我的数据进行子集化,并且我当前正在使用subset()来执行此操作,但我的语法有问题。以下是我正在处理的问题和我当前的代码。

问题:

某些产品ID有0票,如果我们尝试获取日志,则会导致错误。将变量max.votes和number.of.reviews(或您称之为的任何变量)子集化为仅与具有1票或多票的产品对应的值。

代码:

#Use tapply to find max votes recieved by product reviews


max.votes=tapply(`Number of Votes`, `Product ID`, max,na.rm=TRUE)


#Count number of reviews for each product ID

Reviews.per.product=tapply(`Product ID`,`Product ID`,length)



#1i

#Make a scatter plot of max votes as a function of number of reviews

plot(max.votes~Reviews.per.product)

#There is no apparent trend that I am able to pick out from the scatter plot.


#1J

#Create subsets with 0's removed
foods_max_votes_subset = finefoods.dataframe[finefoods.dataframe$Nu >= 1, ]
subset.max.votes=subset(max.votes,max.votes>=1)


subset.Reviews.per.product=subset(Reviews.per.product,max.votes>=1)

更新的代码:

NotZero = which( max.votes >= 1 )
max.votes.subset.test = max.votes[ NotZero ]
Reviews.per.product.subset.test = Reviews.per.product[NotZero] 

1 个答案:

答案 0 :(得分:0)

你可以通过多种方式做到这一点:

使用dplyr:

library(dplyr)

foods_max_votes_subset  <- finefoods.dataframe %>%
    filter(Reviews.per.product >= 1,
           max.votes >= 1)

基地R:

foods_max_votes_subset  <- finefoods.dataframe[(finefoods.dataframe$Reviews.per.product >= 1 & finefoods.dataframe$max.votes >= 1),]

还注意到你的代码看起来很像python。你把这篇文章标记为python和r。你在编码哪个?