我有一个data.table
,它是从一个古怪的文件中获得的:
library(data.table)
istub <- setDT(read.fwf( 'http://www.bls.gov/cex/pumd/2016/csxistub.txt',
widths=c(2,3,64,12,2,3,10), skip=1,
stringsAsFactors=FALSE, strip.white=TRUE,
col.names = c( "type", "level", "title", "UCC",
"survey", "factor","group" )
) )
该文件的一个怪癖是,如果type==2
,则该行仅保留前一行的title
字段的延续。
因此,我想将延续title
附加到上一行的标题。我假设每条普通线只有一条延续线。
对于每个示例,请以:
开头df <- copy(istub) # avoids extra requests of file
我知道我可以做到:
# if type == 2, "title" field should be appended to the above row's "title" field
continued <- which(df$type==2)
# You can see that these titles are incomplete,
# e.g., "School books, supplies, equipment for vocational and"
tail(df$title[continued-1])
df$title[continued-1] <- paste(df$title[continued-1],df$title[continued])
# Now they're complete
# e.g., "School books, supplies, equipment for vocational and technical schools"
tail(df$title[continued-1])
# And we could get rid of the continuation lines
df <- df[-continued]
但是,我想练习一些data.table fu。
data.table
首先,我尝试使用shift()
来对.i
进行子集设置,但这没有用:
df[shift(type, type='lead')==2,
title := paste(title, shift(title, type='lead') ) ] # doesn't work
这有效:
df[,title := ifelse( shift(type, type='lead')==2,
paste(title, shift(title, type='lead')),
title ) ]
我坚持使用两个shift
(似乎效率低下)还是有一种很棒的方式吗?
答案 0 :(得分:2)
我能够使用shift()
版的ifelse()
来做到这一点。
df[, title := paste0(title, shift( ifelse(type==2, paste0(' ',title), ''),
type='lead')
) ]
df <- df[type==1] # can get rid of continuation lines
似乎有点不客气,paste0
-几乎是空的字符串向量,因此欢迎进行改进。
答案 1 :(得分:1)
ifelse
几乎总是可以避免的,值得避免。**
我可能会...
# back up the data before editing values
df0 = copy(df)
# find rows
w = df[type == 2, which = TRUE]
# edit at rows up one
stopifnot(all(w > 1))
df[w-1, title := paste(title, df$title[w])]
# drop rows
res = df[-w]
**一些例子...
问与答
解决方法