我有一个包含如下数据格式的文件:
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
A_row 17 16 10 12 9 15 10 19 9 15 7 3 5 12 6 4 6 8 1 7 6 5 4
B_row 3 5 1 5 2 0 3 1 2 2 3 1 3 2 1 2 1 1 1 0 0 1 1
71 72 73 74 75 76 77 78 80 81 83 84 85 86 87 88 89 90 94 97 103 104
A_row 1 6 0 2 9 5 1 19 9 15 7 3 5 12 6 4 6 8 1 7 6 5 4
B_row 2 5 1 5 2 0 3 1 2 2 3 1 3 2 1 2 1 1 1 0 0 1 1
无论如何将这种格式读入R?谢谢!产品:>
答案 0 :(得分:1)
library(stringi)
library(dplyr)
library(magrittr)
library(tidyr)
text =
"48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
A_row 17 16 10 12 9 15 10 19 9 15 7 3 5 12 6 4 6 8 1 7 6 5 4
B_row 3 5 1 5 2 0 3 1 2 2 3 1 3 2 1 2 1 1 1 0 0 1 1
71 72 73 74 75 76 77 78 80 81 83 84 85 86 87 88 89 90 94 97 103 104
A_row 1 6 0 2 9 5 1 19 9 15 7 3 5 12 6 4 6 8 1 7 6 5 4
B_row 2 5 1 5 2 0 3 1 2 2 3 1 3 2 1 2 1 1 1 0 0 1 1"
df =
text %>%
# split over newlines (could also be accomplished by readLines)
stri_split_fixed(pattern = "\n") %>%
# need to take first list corresponding to text
extract2(1) %>%
# make the text a column in the dataframe
{data_frame(values = .)} %>%
# identify rows based on what type of data they contain
# assume a repeating pattern every 3 lines
mutate(variable = c("id", "A_row", "B_row") %>% rep(length.out = n())) %>%
# for each type of data
group_by(variable) %>%
summarize(value =
values %>%
# concatenate all values
paste(collapse = " ") %>%
# remove headers (might need to modify regex)
stri_replace_all_regex("[A-Z]_row ", "") %>%
# split as space separated data
stri_split_regex(pattern = " +")) %>%
# unnest the lists
unnest(value) %>%
# make values numeric
mutate(value = as.numeric(value)) %>%
# for each variable, number 1 through n() to guess new row ID's
group_by(variable) %>%
mutate(n = 1:n()) %>%
# reshape data
spread(variable, value)
答案 1 :(得分:0)
如上所述,一种方法是使用read.delim
(可能是使用skip& nrows的块),然后cbind
重新组合它们。
根据文件(粘贴它看起来可能需要与read.delim一起使用的其他预处理),另一种方法是使用readLines
和strsplit