我的问题对我来说太明显了,但是,我找不到解决方案。
a有这样的数据框:
<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>
USD Index;D;20150801;000000;97.199;97.336;97.191;97.192
USD Index;D;20150802;000000;97.226;97.294;97.207;97.257
USD Index;D;20150803;000000;97.255;97.582;97.155;97.499
我需要将它们分成不同的列;像这样:
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
USD Index D 20150801 0 97.199 97.336 97.191 97.192
USD Index D 20150802 0 97.226 97.294 97.207 97.257
USD Index D 20150803 0 97.255 97.582 97.155 97.499
这是一个需要位于搜索结果顶部的基本问题。提前感谢您帮助我解决此问题!
答案 0 :(得分:2)
我们可以使用read.table
setNames(read.table(text=dat[,1], sep=";", stringsAsFactors=FALSE),
scan(text=names(dat), sep=";", what = "", quiet = TRUE))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1 USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2 USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3 USD Index D 20150803 0 97.255 97.582 97.155 97.499
dat <- structure(list(`<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>` =
c("USD Index;D;20150801;000000;97.199;97.336;97.191;97.192",
"USD Index;D;20150802;000000;97.226;97.294;97.207;97.257",
"USD Index;D;20150803;000000;97.255;97.582;97.155;97.499"
)), .Names = "<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>",
class = "data.frame", row.names = c(NA, -3L))
答案 1 :(得分:2)
使用fread()
非常容易。使用akrun的dat
,我们有
data.table::fread(paste(c(names(dat), dat[[1]]), collapse = "\n"))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1: USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2: USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3: USD Index D 20150803 0 97.255 97.582 97.155 97.499
对于数据框结果,只需在data.table = FALSE
来电中添加fread()
。
答案 2 :(得分:0)
或者,tstrsplit()
可用于拆分为列,setnames()
可用于重命名列:
library(data.table)
setDT(dat)[, tstrsplit(.SD[[1]], ";")][, setnames(.SD, strsplit(names(dat), ";")[[1]])]
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE> 1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192 2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257 3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499
请注意,<TICKER>
等不语法上有效的列名称需要在许多地方进行转义。因此,我建议摆脱这样的尖括号:
setDT(dat)[, tstrsplit(.SD[[1]], ";")][
, setnames(.SD, gsub("[<>]", "", strsplit(names(dat), ";")[[1]]))]
TICKER PER DATE TIME OPEN HIGH LOW CLOSE 1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192 2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257 3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499