I'm having a dataset of 15 million rows with single column. It looks like,
x_raw
A1
A2
A3
A4
B1
B2
B3
B4
C1
C2
I want to convert it to
A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
I was trying with 'for' loop, that will transpose every 4 rows, and add them to a 'final' dataframe, but as the dataset is too large, it'll iterate almost 2.7 million times which is not working that efficiently. Is there any other method or anything that I can use to do it efficiently?
答案 0 :(得分:2)
Here is one option with tidyverse
where the separate
the 'x_raw' into two column and then spread
to 'wide' format
library(dplyr)
library(tidyr)
separate(df1, x_raw, into = c('x', 'rn'), sep="(?=\\d+)", remove = FALSE) %>%
spread(rn, x_raw) %>%
select(-x)
# 1 2 3 4
#1 A1 A2 A3 A4
#2 B1 B2 B3 B4
#3 C1 C2 <NA> <NA>
Or if the number of elements are always 4, then we can also do
as.data.frame(matrix(df1$x_raw, ncol =4, byrow = TRUE), stringsAsFactors=FALSE)
答案 1 :(得分:2)
if you just want to convert to a four column data frame:
as.data.frame(matrix(df$x_raw,ncol=4,byrow = T))
答案 2 :(得分:2)
见这个,
x_raw <- c("A1","A2","A3","A4","B1","B2","B3","B4","C1","C2","C3","C4","D1","D2","D3","D4")
x <- as.table(matrix(x_raw,ncol=4,byrow = T))
rownames(x) <- NULL
colnames(x) <- NULL
print(x)
它返回:
[,1] [,2] [,3] [,4]
[1,] A1 A2 A3 A4
[2,] B1 B2 B3 B4
[3,] C1 C2 C3 C4
[4,] D1 D2 D3 D4
答案 3 :(得分:1)
Expand the length to the next block of 4 values, and put it in a matrix:
matrix(`length<-`(dat$x_raw, (nrow(dat) %/% 4 + 1) * 4), ncol=4, byrow=TRUE)
# [,1] [,2] [,3] [,4]
#[1,] "A1" "A2" "A3" "A4"
#[2,] "B1" "B2" "B3" "B4"
#[3,] "C1" "C2" NA NA