出于研究目的,我需要处理来自csv表的数据。该表如下所示:
Frame Nr. 0 frame_type I_frame
Frame Nr. 1 frame_type P_frame
Frame Nr. 2 frame_type P_frame
Frame Nr. 3 frame_type B_frame
Frame Nr. 4 frame_type P_frame
Frame Nr. 5 frame_type P_frame
Frame Nr. 6 frame_type B_frame
Frame Nr. 7 frame_type P_frame
Frame Nr. 8 frame_type P_frame
Frame Nr. 9 frame_type I_frame
Frame Nr. 10 frame_type P_frame
Frame Nr. 11 frame_type P_frame
Frame Nr. 12 frame_type P_frame
Frame Nr. 13 frame_type I_frame
Frame Nr. 14 frame_type P_frame
Frame Nr. 15 frame_type P_frame
Frame Nr. 16 frame_type B_frame
Frame Nr. 17 frame_type P_frame
Frame Nr. 18 frame_type P_frame
Frame Nr. 19 frame_type P_frame
Frame Nr. 20 frame_type P_frame
Frame Nr. 21 frame_type I_frame
Frame Nr. 22 frame_type P_frame
Frame Nr. 23 frame_type P_frame
Frame Nr. 24 frame_type P_frame
Frame Nr. 25 frame_type I_frame
...
我希望R首先对每个I_frame开始的帧进行分组,然后用另一个I_frame计算p帧和b帧的总和。在这个例子中,我的R程序应该提供如下结果:
I2PB2PB2P I3P I2PB4P I3P ...
R中有没有办法做到这一点?
答案 0 :(得分:1)
从以前的错误答案编辑并从@akron借用rle
,您可以这样做:假设您的数据位于名为" df"的数据框中。和你的"框架类"在名为" frame_class"的列中,如下面的代码所示,这应该有效:
df = data.frame(n_frame = seq(1:13), frame_type = "frame_type",
frame_class = c("I_frame", "P_frame", "P_frame", "B_frame", "P_frame", "P_frame",
"B_frame", "I_frame", "B_frame", "P_frame", "I_frame", "P_frame", "I_frame"))
df$frame_letter = substring(df$frame_class,1,1) # get only the beginning letter
# Find the location of I_frames
where_i = which(df$frame_class == "I_frame")
num_i = length(where_i)
out_codes = list()
for (ind_i in 1:(num_i-1)){ # cycle on "sandwiches"
start = where_i[ind_i]
end = where_i[ind_i+1]
sub_data = df$frame_letter[(start+1):(end-1)] # Get data in a sandwich
count_reps = rle(sub_data) # find repetitions pattern
# build the codes
out_code = "I"
for (ind_letter in 1:length(count_reps$lengths)){
out_code= paste0(out_code, ifelse(count_reps$lengths[ind_letter] == 1,
count_reps$values[ind_letter], # If only 1 rep, don't add "1" in the string
paste0(count_reps$lengths[ind_letter], count_reps$values[ind_letter])))
}
out_codes [[ind_i]] = out_code # put in list
}
out_codes
,它给出了:
> out_codes
[[1]]
[1] "I2PB2PB"
[[2]]
[1] "IBP"
[[3]]
[1] "IP"
请注意它非常快速和肮脏:你至少应该实施一些检查,以确保该系列始终以" I_frame"开头和结尾,但这可能会让你进入正确的方向......
另请注意,对于大型数据集,这可能会很慢。
洛伦佐