我正在研究一个问题,需要我对我的数据进行子集化,并且我当前正在使用subset()来执行此操作,但我的语法有问题。以下是我正在处理的问题和我当前的代码。
问题:
某些产品ID有0票,如果我们尝试获取日志,则会导致错误。将变量max.votes和number.of.reviews(或您称之为的任何变量)子集化为仅与具有1票或多票的产品对应的值。
代码:
#Use tapply to find max votes recieved by product reviews
max.votes=tapply(`Number of Votes`, `Product ID`, max,na.rm=TRUE)
#Count number of reviews for each product ID
Reviews.per.product=tapply(`Product ID`,`Product ID`,length)
#1i
#Make a scatter plot of max votes as a function of number of reviews
plot(max.votes~Reviews.per.product)
#There is no apparent trend that I am able to pick out from the scatter plot.
#1J
#Create subsets with 0's removed
foods_max_votes_subset = finefoods.dataframe[finefoods.dataframe$Nu >= 1, ]
subset.max.votes=subset(max.votes,max.votes>=1)
subset.Reviews.per.product=subset(Reviews.per.product,max.votes>=1)
更新的代码:
NotZero = which( max.votes >= 1 )
max.votes.subset.test = max.votes[ NotZero ]
Reviews.per.product.subset.test = Reviews.per.product[NotZero]
答案 0 :(得分:0)
你可以通过多种方式做到这一点:
使用dplyr:
library(dplyr)
foods_max_votes_subset <- finefoods.dataframe %>%
filter(Reviews.per.product >= 1,
max.votes >= 1)
基地R:
foods_max_votes_subset <- finefoods.dataframe[(finefoods.dataframe$Reviews.per.product >= 1 & finefoods.dataframe$max.votes >= 1),]
还注意到你的代码看起来很像python。你把这篇文章标记为python和r。你在编码哪个?