如何将这样的文件读入R?

时间:2015-09-08 20:30:44

标签: r

我有一个包含如下数据格式的文件:

           48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
A_row  17 16 10 12  9 15 10 19  9 15  7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  3  5  1  5  2  0  3  1  2  2  3  1  3  2  1  2  1  1  1  0  0  1  1
           71 72 73 74 75 76 77 78 80 81 83 84 85 86 87 88 89 90 94 97 103 104
A_row  1 6 0 2  9 5 1 19 9 15 7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  2 5 1 5  2 0 3 1  2 2  3  1  3  2  1  2  1  1  1  0  0  1  1

无论如何将这种格式读入R?谢谢!产品:>

2 个答案:

答案 0 :(得分:1)

library(stringi)
library(dplyr)
library(magrittr)
library(tidyr)

text = 
  "48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
A_row  17 16 10 12  9 15 10 19  9 15  7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  3  5  1  5  2  0  3  1  2  2  3  1  3  2  1  2  1  1  1  0  0  1  1
71 72 73 74 75 76 77 78 80 81 83 84 85 86 87 88 89 90 94 97 103 104
A_row  1 6 0 2  9 5 1 19 9 15 7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  2 5 1 5  2 0 3 1  2 2  3  1  3  2  1  2  1  1  1  0  0  1  1"

df  = 
  text %>% 
  # split over newlines (could also be accomplished by readLines)
  stri_split_fixed(pattern = "\n") %>% 
  # need to take first list corresponding to text
  extract2(1) %>%
  # make the text a column in the dataframe
  {data_frame(values = .)} %>%
  # identify rows based on what type of data they contain
  # assume a repeating pattern every 3 lines
  mutate(variable = c("id", "A_row", "B_row") %>% rep(length.out = n())) %>%
  # for each type of data
  group_by(variable) %>%
  summarize(value = 
              values %>%
              # concatenate all values
              paste(collapse = " ") %>%
              # remove headers (might need to modify regex)
              stri_replace_all_regex("[A-Z]_row  ", "") %>%
              # split as space separated data
              stri_split_regex(pattern = " +")) %>%
  # unnest the lists
  unnest(value) %>%
  # make values numeric
  mutate(value = as.numeric(value)) %>%
  # for each variable, number 1 through n() to guess new row ID's
  group_by(variable) %>%
  mutate(n = 1:n()) %>%
  # reshape data
  spread(variable, value)

答案 1 :(得分:0)

如上所述,一种方法是使用read.delim(可能是使用skip& nrows的块),然后cbind重新组合它们。

根据文件(粘贴它看起来可能需要与read.delim一起使用的其他预处理),另一种方法是使用readLinesstrsplit