Question

我目前有一系列ID作为因素。我有一个for循环，它检查数据帧中的那些ID并返回一个特定的值。我正在创建一个数据框，在第1列中存储当前通过循环运行的ID，在第二列中存储感兴趣的值。

我遇到的问题是在将 ith ID分配给我的数据框时，它返回因子的索引号而不是值。见代码。

ref <- unique(yearsd[,11]) # yearsd df has customer records; i'm extracting unique IDs
counter <- data.frame(matrix(ncol = 2, nrow = length(ref))) # initialize counter for for loop

for(i in 1:length(ref))
{
  loc <- which(ref[i] == yearsd[,11]) # returns positions of IDs
  yearTF <- unique(yearsd[loc,3])     # gives me a vector of years that ID shows up
  counter[i,1] = print(ref[i])        # store the ID currently in the loop
  counter[i,2] = length(yearTF)       # store the number of years the show up in the records
}

如果 ref 的 ith 元素是ABCD并且是因子的第32级，我的计数器[i，1] 值结束是32而不是ABCD。我也试过print(ref[i])，但也没有运气。我总是得到该因子的等级索引号。

如果我把它改成角色会更好吗？它们是字母数字字符串。

修改

yearsd是带有客户记录的df
yearsd [，11]包含客户ID
对于每条记录，都有一个交易日期，只存储该日期年，例如2005年，2006年等。

我正试图通过多年来获得一个df，其中包含一列中的客户ID以及第二列中他们有多少年交易的数量/

Example Output:

CustID   YearsIn
A0001    3
D504     1
RR45Y    2

意味着客户A0001在3年内进行了交易，D504仅在1年内进行了交易，而RR45Y在2年内进行了交易。每个客户可能在一年内有多个交易。我只想知道他们是否至少有1个;如果是这样的话，我会为那个客户计算那一年。

如果您有任何疑问，请与我们联系。我很感激帮助。

Answer 1

如何使用aggregate代替（因为这是您真正想要解决的问题。）

#sample data
dd<-data.frame(
    cust=rep(c("A001", "D504","RR457"), c(3,1,2)),
    year = c(2001:2003, 2002, 2003:2004)
)

aggregate(year~cust, dd, function(x) length(unique(x)))

#    cust year
# 1  A001    3
# 2  D504    1
# 3 RR457    2

但回过头来看你的问题，你不能用这种方式真正初始化data.frame。在没有行的情况下进行设置时，它会选择最简单的数据类型（空逻辑向量）。如果您想预先填充data.frame，那么更好的策略就是

ref <- unique(dd$cust)
counter <- data.frame(id=factor(NA,levels=ref), 
    count=numeric(length(ref)), stringsAsFactors=F) 

for(i in 1:length(ref)) {
  loc <- which(ref[i] == dd$cust)
  yearTF <- unique(dd[loc,"year"])
  counter[i,1] <- ref[i]
  counter[i,2] <- length(yearTF)
}
counter

甚至只是做

counter[i,1] <- as.character(ref[i])

会强制转换为字符（print()不会这样做。）

如何将向量的值分配给数据框？

修改

1 个答案: