我想将下面的数据框分为五列。应在每个"之后创建列。 - "。请注意,某些观察结果(26和28)有一个额外的字段(" uk"" es")。所以在最后一栏中所有观察结果都是26和28应该包含NA。
26 paid 21.09 - abs - E X1028 - 61,77 - uk.pdf
27 paid 21.09 - corefunction - mah - 125,66.PDF
28 paid 21.09 - mrl - mah - 456,96 - es.PDF
29 paid 21.09 - mollea - inv - 297,50.pdf
30 paid 21.09 - saless - inv - 117,81.pdf
31 paid 23.09 - boc - inv - 59,80.pdf
答案 0 :(得分:5)
使用data.table
library(data.table) # v 1.9.6+
setDT(df)[, tstrsplit(V1, "-")]
# V1 V2 V3 V4 V5
# 1: paid 21.09 abs E X1028 61,77 uk.pdf
# 2: paid 21.09 corefunction mah 125,66.PDF NA
# 3: paid 21.09 mrl mah 456,96 es.PDF
# 4: paid 21.09 mollea inv 297,50.pdf NA
# 5: paid 21.09 saless inv 117,81.pdf NA
# 6: paid 23.09 boc inv 59,80.pdf NA
数据强>
df <- structure(list(V1 = structure(c(1L, 2L, 4L, 3L, 5L, 6L), .Label = c("paid 21.09 - abs - E X1028 - 61,77 - uk.pdf",
"paid 21.09 - corefunction - mah - 125,66.PDF", "paid 21.09 - mollea - inv - 297,50.pdf",
"paid 21.09 - mrl - mah - 456,96 - es.PDF", "paid 21.09 - saless - inv - 117,81.pdf",
"paid 23.09 - boc - inv - 59,80.pdf"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))
答案 1 :(得分:3)
或使用splistackshape
:
library(splitstackshape)
cSplit(df, 'V1', sep='-')
# V1_1 V1_2 V1_3 V1_4 V1_5
#1: paid 21.09 abs E X1028 61,77 uk.pdf
#2: paid 21.09 corefunction mah 125,66.PDF NA
#3: paid 21.09 mrl mah 456,96 es.PDF
#4: paid 21.09 mollea inv 297,50.pdf NA
#5: paid 21.09 saless inv 117,81.pdf NA
#6: paid 23.09 boc inv 59,80.pdf NA
答案 2 :(得分:1)
如果您从文件中读取数据,那么您可以像这样读取数据,它也比使用read.table然后使用data.table函数更快。
library(read)
df<-read_delim("test.txt", "-", col_names = F)
答案 3 :(得分:0)
我会遍历数据并拆分每一行。我不确定我是否理解NA问题。
for (i in 1:nrow(data)) {
cols=strsplit(data[i,],"-")
newdata[i]=cols
}