我非常陌生,请提前抱歉。我有两个向量,一个是帐户名称的字符向量(30),另一个是产品名称的字符向量(30)。最后,我有一个数据框,其中包含三列客户名称,产品名称和收入,但是这个列表远远超出了其中的30个。
最终,我需要一个30x30的数据框行作为产品名称向量中的产品,列作为帐户名称向量中的帐户名,并将值作为与该列中的帐户和该行中的产品相关联的收益。
我认为我需要嵌套循环功能吗?但是我不知道如何使用它来适当地填充数据框。
account<-c("a","b",etc)
product<-c("prod_a","prod_b", etc)
for(i in 1:length(account)){
for(i in 1:length(product)){
.....
}
}
老实说我只是迷路了哈哈
答案 0 :(得分:0)
我思考我知道您要在这里做什么。我怀疑您有充分的理由想要这种30x30的交叉表类型的结构,但是我也想借此机会鼓励"tidy" data进行分析。对于要视为“整洁”的数据,可以通过以下三个主要标准来概括该链接:
每个变量构成一列。
每个观察结果都排成一行。
每种类型的观测单位组成一个表格。
也就是说,以下是我试图解释和演示我认为您想要实现的目标。
library(tidyr)
# set up some fake data to better explain
account_vec <- paste0(letters, 1:26)
product_vec <- paste0(as.character(101:126), LETTERS)
revenue_vec <- rnorm(26*26)
# permutating accounts and products to set up our fake data
df <- expand.grid(account_vec, product_vec)
names(df) <- c("accounts", "products")
df$revenue <- revenue_vec
# if this is what your data looks like currently, I would consider this fairly "tidy"
# now let's pretend there's some data we need to filter out
df <- rbind(df,
data.frame(
accounts = paste0("bad_account", 1:3),
products = paste0("bad_product", 1:3),
revenue = rnorm(3)
)
)
# filter to just what is included in our "accounts" and "products" vectors
df <- df[df$accounts %in% account_vec, ]
df <- df[df$products %in% product_vec, ]
# spread out the products so they occupy the column values
df2 <- df %>% tidyr::spread(key="products", value="revenue")
# if you aren't familiar with the "%>%" pipe operator, the above
# line of code is equivalent to this one below:
# df2 <- tidyr::spread(df, key="products", value="revenue")
# now we have accounts as rows, products as columns, and revenues at the intersection
# we can go one step further by making the accounts our row names if we want
row.names(df2) <- df2$accounts
df2$accounts <- NULL
# now the accounts are in the row name and not in a column on their own