Question

我想从这个数据集中获取一个新的data.frame，但是在某些行之间有一些描述“＃”，而某些行包含“＃”符号，我可以在conditon substr下使用“for”循环（ x，1,1）！=“＃”和gsub（）用正则表达式得到我需要的结果，我的问题是我是否可以在没有“for”循环的情况下得到相同的结果？

     1.#Software: Microsoft Internet Information Services 6.0
     2.#Version: 1.0
     3.#Fields: date time ip method stem query s-port username sc-substatus 
     4.2013-08-27 16:00:00 117.79.149.2 GET /images/tr.gif uid=936206260 200 0 0
     6.2013-08-27 16:00:01 117.79.149.2 GET /images/tr.gif referrer=#http://Ftrack 200 0 0     
     7.#Software: Microsoft Internet Information Services 6.0
     8.2013-08-27 16:00:02 117.79.149.2 GET /images/tr.gif uid=936206269 200 0 0
     9.2013-08-27 16:00:03 117.79.149.2 GET /images/tr.gif utm_medium#3Dc02#26utm 200 0 0
     10. ..........
     11. ..........

成为这个：

      V1                    V2
      2013-08-27 16:00:00   200
      2013-08-27 16:00:01   200
      2013-08-27 16:00:02   200
      2013-08-27 16:00:03   200
      ....................
      ....................

Answer 1

我假设您想要从外部表中读取数据（从您的问题中不清楚），因此我回答您的问题，在“read.table”选项中使用comment.char =“＃”，它将忽略以＃。

开头的行

请参阅?read.table。

所以，你的第一行可能是：

x <- read.table("comm.txt",comment.char="#"),

其中“comm.txt”是包含根据您给定格式的数据的文件。

然后，您可以使用以下代码根据分隔符“ - ”

拆分列

library(reshape2)
LS <- lapply(seq_along(x), function(i){
    colsplit(x[, i], "-", paste0(colnames(x)[i], letters[1:3]))
    }
)

do.call('cbind', LS)

希望这有帮助

Answer 2

首先，从您的单列数据框#中删除以x开头的行：

vec <- grep("^[^#]", x[[1]], value = TRUE)

然后，根据剩余数据创建一个新数据框：

data.frame(V1 = gsub("(.*\\:[0-9]+) .*", "\\1", vec),
           V2 = gsub(".* ([0-9]+) [0-9]+ [0-9]+ *$", "\\1", vec))

#                    V1  V2
# 1 2013-08-27 16:00:00 200
# 2 2013-08-27 16:00:01 200
# 3 2013-08-27 16:00:02 200
# 4 2013-08-27 16:00:03 200

在提取元素时，如何在没有循环的情况下跳过“＃”符号？

2 个答案: