data.table中的条件滚动字符串concat

时间:2018-07-17 20:36:43

标签: r data.table string-concatenation

我有一个data.table,它是从一个古怪的文件中获得的:

library(data.table)

istub  <- setDT(read.fwf( 'http://www.bls.gov/cex/pumd/2016/csxistub.txt', 
                          widths=c(2,3,64,12,2,3,10), skip=1,
                          stringsAsFactors=FALSE, strip.white=TRUE,
                          col.names = c( "type", "level", "title", "UCC", 
                                         "survey", "factor","group" )
                ) )

该文件的一个怪癖是,如果type==2,则该行仅保留前一行的title字段的延续。

因此,我想将延续title附加到上一行的标题。我假设每条普通线只有一条延续线。

对于每个示例,请以:

开头
df <- copy(istub) # avoids extra requests of file

Base R解决方案:(所需结果)

我知道我可以做到:

# if type == 2, "title" field should be appended to the above row's "title" field
continued <- which(df$type==2)

# You can see that these titles are incomplete,
#  e.g., "School books, supplies, equipment for vocational and"  
tail(df$title[continued-1])

df$title[continued-1] <- paste(df$title[continued-1],df$title[continued])

# Now they're complete
# e.g., "School books, supplies, equipment for vocational and technical schools"    
tail(df$title[continued-1])

# And we could get rid of the continuation lines
df <- df[-continued]

但是,我想练习一些data.table fu。

尝试使用data.table

首先,我尝试使用shift()来对.i进行子集设置,但这没有用:

df[shift(type, type='lead')==2, 
     title := paste(title, shift(title, type='lead') ) ] # doesn't work

这有效:

df[,title := ifelse( shift(type, type='lead')==2,
                     paste(title, shift(title, type='lead')),
                     title ) ]

我坚持使用两个shift(似乎效率低下)还是有一种很棒的方式吗?

2 个答案:

答案 0 :(得分:2)

我能够使用shift()版的ifelse()来做到这一点。

df[, title := paste0(title, shift( ifelse(type==2, paste0(' ',title), ''),
                                   type='lead')
                     ) ]
df <- df[type==1] # can get rid of continuation lines

似乎有点不客气,paste0-几乎是空的字符串向量,因此欢迎进行改进。

答案 1 :(得分:1)

ifelse几乎总是可以避免的,值得避免。**

我可能会...

# back up the data before editing values
df0 = copy(df)

# find rows
w = df[type == 2, which = TRUE]

# edit at rows up one
stopifnot(all(w > 1))
df[w-1, title := paste(title, df$title[w])]

# drop rows
res = df[-w]

**一些例子...

问与答

解决方法