如何使用R动态地将值插入数据框

时间:2017-07-26 17:55:30

标签: r dataframe

从网站上抓取一些评论数据后,我很难将数据组织成一个有用的分析结构。问题是数据是动态的,因为每个评论者都在0到3个子类别之间的任何地方给出评级(表示为子类别" a"," b"" c&# 34)。我想组织评论,以便每一行都是不同的评论者,每列都是一个被评级的子类别。如果评论者选择不对某个子类别进行评分,我希望缺少的数据为“NA'”。以下是数据的简化示例:

vec <- c("a","b","c","stop", "a","b","stop", "stop", "c","stop")
ratings <- c(2,5,1, 1,3, 2) 

vec包含已评分的子类别的信息,以及&#34; stop&#34;是每个评论者评级的结束。因此,我想将结果组织到具有此结构的数据框架中。预期产出

enter image description here

我非常感谢你们提供任何帮助,因为我一直在研究这个问题的时间比我应该的时间长得多。

4 个答案:

答案 0 :(得分:4)

@alexis_laz提供了我认为最好的答案:

vec <- c("a","b","c","stop", "a","b","stop", "stop", "c","stop")
ratings <- c(2,5,1, 1,3, 2) 

stops <- vec == "stop"
i = cumsum(stops)[!stops] + 1L
j = vec[!stops]
tapply(ratings, list(factor(i, 1:max(i)), factor(j)), identity) # although mean/sum work  
#      a  b  c
#[1,]  2  5  1
#[2,]  1  3 NA
#[3,] NA NA NA
#[4,] NA NA  2

答案 1 :(得分:3)

基础R,但我正在使用for循环...

vec <- c("a","b","c","stop", "a","b","stop", "stop", "c","stop")
ratings <- c(2,5,1, 1,3, 2) 
categories <- unique(vec)[unique(vec)!="stop"]

row = 1
df = data.frame(lapply(categories, function(x){NA_integer_}))
colnames(df) <- categories
rating = 1

for(i in vec) {  
  if(i=='stop') {row <- row+1
  } else { df[row,i] <- ratings[[rating]]; rating <- rating+1}
}

答案 2 :(得分:2)

这是一个选项

df2.to_csv(path_or_buf=("%s.csv" % id_nums[idx]), sep=',')

答案 3 :(得分:2)

使用基本R函数和来自rbind.fill的{​​{1}}或来自plyr的{​​{1}}生成最终对象,我们可以

rbindlist

现在,您可以使用data.table# convert vec into a list, split by "stop", dropping final element temp <- head(strsplit(readLines(textConnection(paste(gsub("stop", "\n", vec, fixed=TRUE), collapse=" "))), split=" "), -1) # remove empty strings, but maintain empty list elements temp <- lapply(temp, function(x) x[nchar(x) > 0]) # match up appropriate names to the individual elements in the list with setNames # convert vectors to single row data.frames temp <- Map(function(x, y) setNames(as.data.frame.list(x), y), relist(ratings, skeleton = temp), temp) # add silly data.frame (single row, single column) for any empty data.frames in list temp <- lapply(temp, function(x) if(nrow(x) > 0) x else setNames(data.frame(NA), vec[1])) 生成单个data.frame(data.table)

plyr

请注意,data.table之前的行可以替换为

# with plyr, returns data.frame
library(plyr)
do.call(rbind.fill, temp)
   a  b  c
1  2  5  1
2  1  3 NA
3 NA NA NA
4 NA NA  2

# with data.table, returns data.table
 rbindlist(temp, fill=TRUE)
    a  b  c
1:  2  5  1
2:  1  3 NA
3: NA NA NA
4: NA NA  2

在整个列表中使用子集而不是rbind替换空数据框的列表项。