2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999.0 -9 -9 -9 -9 4868.8 2017 0 7114
2.0 6.0297 35.0199 34.4101 2.0 11111
3.0 6.0279 35.0201 34.4091 3.0 11111
4.0 6.0272 35.0203 34.4091 4.0 11111
5.0 6.0273 35.0204 34.4097 4.9 11111
6.0 6.0274 35.0205 34.4104 5.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.3510 27 6 4.1 -999.0 3 7 2 0 4903.8 1570 0 7114
3.0 6.4129 34.5637 34.3541 3.0 11111
4.0 6.4349 34.5748 34.3844 4.0 11111
5.0 6.4803 34.5932 34.4426 4.9 11111
6.0 6.4139 34.5624 34.3552 5.9 11111
7.0 6.5079 34.6097 34.4834 6.9 11111
我想将测站数据添加到观测中,以便每个变量都是一列 每个观察结果都是一行,就像这样:
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 2 6.0297 35.0199 34.4101 2 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 3 6.0279 35.0201 34.4091 3 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 4 6.0272 35.0203 34.4091 4 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 5 6.0273 35.0204 34.4097 4.9 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 6 6.0274 35.0205 34.4104 5.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 3 6.4129 34.5637 34.3541 3 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 4 6.4349 34.5748 34.3844 4 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 5 6.4803 34.5932 34.4426 4.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 6 6.4139 34.5624 34.3552 5.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 7 6.5079 34.6097 34.4834 6.9 11111
答案 0 :(得分:2)
第一部分涉及找到“ $”标记,将以下几行分组在一起,然后将数据块“嵌套”在一起。然后,我们有一个只有几行的数据框-每节一个。
txt_lns <- readLines("ocean-sample.txt")
txt <- tibble(txt = txt_lns)
# Start by finding new sections, and nesting the data
nested_txt <- txt %>%
mutate(row_number = row_number()) %>%
mutate(new_section = str_detect(txt, "\\$")) %>% # Mark new sections
mutate(starting = ifelse(new_section, row_number, NA)) %>% # Index with row num
tidyr::fill(starting) %>% # Fill index down
# where missing
select(-new_section) %>% # Clean up
filter(!str_detect(txt, "\\$")) %>%
nest(data = c(txt, row_number)) # "Nest" the data
# Take a quick look
# Deal with the records within a section
parse_inner_block <- function(x, header_ind) {
if (header_ind) {
df <- x %>%
mutate(txt = str_trim(txt)) %>%
# Separate the header row into 22 variables
separate(txt, into = LETTERS[1:22], sep = "\\s+")
} else {
df <- x %>%
mutate(txt = str_trim(txt)) %>%
# Separate the lesser rows into 6 variables
separate(txt, into = letters[1:6], sep = "\\s+")
parse_outer_block <- function(x) {
df <- x %>%
# Determine if it's a header row with 22 variables or lesser row with 6
mutate(leading_row = (row_number == min(row_number))) %>%
# Fold by header row vs. not
nest(data = c(txt, row_number)) %>%
# Create data frames for both header and lesser rows
mutate(processed = purrr::map2(data, leading_row, parse_inner_block)) %>%
unnest(processed) %>%
# Copy header row values to lesser rows
tidyr::fill(A:V) %>%
# Drop header row
# Actually put all this together and generate an output dataframe
output <- nested_txt %>%
mutate(proc_out = purrr::map(data, parse_outer_block)) %>%
select(-data) %>%
unnest(proc_out) %>%
select(-starting, -leading_row, -data, -row_number)
答案 1 :(得分:1)
这更简单,仅取决于基数R。我假设您已经首先使用x <- readLines(....)
start <- which(x == "$") + 1 # Find header indices
rows <- diff(c(start, length(x)+2)) - 2 # Find number of lines per group
# Function to read header and rows and cbind
getdata <- function(begin, end) {
cbind(read.table(text=x[begin]), read.table(text=x[(begin+1):(begin+end)]))
dta.list <- lapply(1:(length(start)), function(i) getdata(start[i], rows[i]))
dta.df <- do.call(rbind, dta.list)