Question

我有一个数据框，其中包含许多卖家ID的信息，以及他们卖出的时间。我想创建一个名为inactive的新列，如果他们在接下来的6个时段没有卖出。

以下是样本数据集的输入：

structure(list(SellerID = c(1, 7, 4, 3, 1, 7, 4, 2, 5, 1, 2, 
5, 7), Period = c(1, 1, 1, 2, 2, 3, 3, 5, 5, 9, 9, 10, 10)), .Names = c("SellerID", 
"Period"), row.names = c(NA, -13L), class = "data.frame")

这是我理想结果的输入（第5行的非活动为1，因为对于该行，卖家ID 1在第2期进行了销售，但他的下一次销售是在第9期[第10行]。因此，他是至少6个期间不活动，因此我们想要记录，以便预测卖家何时不活动）：

structure(list(SellerID = c(1, 7, 4, 3, 1, 7, 4, 2, 5, 1, 2, 
5, 7), Period = c(1, 1, 1, 2, 2, 3, 3, 5, 5, 9, 9, 10, 10), Inactive = c(0, 
0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0)), .Names = c("SellerID", 
"Period", "Inactive"), row.names = c(NA, -13L), class = "data.frame")

我尝试使用嵌套for循环方法解决此问题，但我的数据集非常大，运行需要很长时间（大约200,000行）。我也在样本数据集上尝试了我的方法，但它似乎不起作用。以下是我的方法：

full.df$Inactive <- NA
for (i in 1:nrow(full.df)){
  temp = subset(full.df, SellerID = unique(full.df$SellerID[i]))
  for(j in 1:(nrow(temp) -1)){
    if(temp$Period[j+1] - temp$Period[j] <6)
      temp$Inactive[j] <-0
    else
      temp$Inactive[j] <-1
  }
  full.df[rownames(full.df) %in% rownames(temp), ]$Inactive <- temp$Inactive
}

虚拟数据集的输出，使用我的方法在“非活动”的所有行中放置0，除了最后一行是NA。这是我得到的输出的输入：

structure(list(SellerID = c(1, 7, 4, 3, 1, 7, 4, 2, 5, 1, 2, 
5, 7), Period = c(1, 1, 1, 2, 2, 3, 3, 5, 5, 9, 9, 10, 10), Inactive = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA)), .Names = c("SellerID", 
"Period", "Inactive"), row.names = c(NA, -13L), class = "data.frame")

Answer 1

我在这里假设一件事。周期变量的最大范围是12。

这是逻辑：您订购数据框。然后你将12添加到列表的末尾并采取不同之处。这也将对卖方3进行分类，该卖方3在7天的范围内处于非活动状态。

df_s=df[with(df, order(SellerID, Period)),]
g=split(df$Period, df$SellerID)
l=lapply(g, function(x) c(x,12) )
j=lapply(l, diff)
u=unlist(j, use.names = F)
df_s$ind=ifelse(u>=7,1,0)

Answer 2

使用.SomeMenuClass a {/*menu items with submenu*/ color:black; } .SomeMenuClass a:only-child {/*menu items without submenu*/ color:Green; }

R --vanilla

对于拥有1 000 000行和1个卖家的data.frame，正常PC上的运行时间大约为1秒。

根据R中的多个列条件有效地分配新的列值

2 个答案: