Question

我正在研究一个问题，需要我对我的数据进行子集化，并且我当前正在使用subset（）来执行此操作，但我的语法有问题。以下是我正在处理的问题和我当前的代码。

问题：

某些产品ID有0票，如果我们尝试获取日志，则会导致错误。将变量max.votes和number.of.reviews（或您称之为的任何变量）子集化为仅与具有1票或多票的产品对应的值。

代码：

#Use tapply to find max votes recieved by product reviews


max.votes=tapply(`Number of Votes`, `Product ID`, max,na.rm=TRUE)


#Count number of reviews for each product ID

Reviews.per.product=tapply(`Product ID`,`Product ID`,length)



#1i

#Make a scatter plot of max votes as a function of number of reviews

plot(max.votes~Reviews.per.product)

#There is no apparent trend that I am able to pick out from the scatter plot.


#1J

#Create subsets with 0's removed
foods_max_votes_subset = finefoods.dataframe[finefoods.dataframe$Nu >= 1, ]
subset.max.votes=subset(max.votes,max.votes>=1)


subset.Reviews.per.product=subset(Reviews.per.product,max.votes>=1)

更新的代码：

NotZero = which( max.votes >= 1 )
max.votes.subset.test = max.votes[ NotZero ]
Reviews.per.product.subset.test = Reviews.per.product[NotZero]

Answer 1

你可以通过多种方式做到这一点：

使用dplyr：

library(dplyr)

foods_max_votes_subset  <- finefoods.dataframe %>%
    filter(Reviews.per.product >= 1,
           max.votes >= 1)

基地R：

foods_max_votes_subset  <- finefoods.dataframe[(finefoods.dataframe$Reviews.per.product >= 1 & finefoods.dataframe$max.votes >= 1),]

还注意到你的代码看起来很像python。你把这篇文章标记为python和r。你在编码哪个？

在数据帧上使用子集

1 个答案: