将具有多行的一个变量转换为多列

时间:2018-07-01 12:30:02

标签: r gsub read.csv

一个数据帧(df)具有多行和一列。我应该将一列转换为多列,然后将1:删除为N:。

head(df[1:3,])

[1] Q1        1     1: 0.009110   2:-0.002122   3:-0.005770   4:-0.016751   5: 0.003284   6:-0.082381              
[2] Q2        1     1: 0.018065   2:-0.033954   3:-0.033954   4: 0.005826   5:-0.033918   6:-0.034069   7:-0.030281   
[3] Q3        1     1: 0.058728   2: 0.003693   3:-0.008006   4: 0.035635   5: 0.039816   6: 0.040578              
20 Levels: Q1        1     1: 0.009110   2:-0.002122   3:-0.005770   4:-0.016751   5: 0.003284   6:-0.082381 ...

df<-read.csv("effect.txt",header = F,skip = 1)
df2 <- lapply(df, gsub, pattern="1:", replacement= "")

1 个答案:

答案 0 :(得分:0)

这有很长的路要走,但是行得通。

#Read the data set
df <- read.table(text = "
                 'Q1        1     1: 0.009110   2:-0.002122   3:-0.005770   4:-0.016751   5: 0.003284   6:-0.082381'              
                 'Q2        1     1: 0.018065   2:-0.033954   3:-0.033954   4: 0.005826   5:-0.033918   6:-0.034069   7:-0.030281'   
                 'Q3        1     1: 0.058728   2: 0.003693   3:-0.008006   4: 0.035635   5: 0.039816   6: 0.040578 '             
                 ",header=F)

library(tidyr)
df[,1] <- gsub("[1-9]:",";",df[,1])  #replace any one digit number i.e. [1-9]  followed by ':' with ';'
df[,1] <- gsub("Q[1-9]        1     ;","",df[,1])   #replace any Q with one digit number then space one digit number then space then ';' e.g. "Q1        1     ;", "Q2        1     ;", "Q3        1     ;", ... etc with ""

max.length <- max(sapply(strsplit(df[,1],";"),length))   #find the length of each row to predifenied the number of columns required by `separate` 
df_clean <- separate(df,1, paste0("a",1:max.length),sep = ";",fill = "right")

df_clean %>% mutate_if(is.character,as.numeric) #change all character columns to numeric

        a1        a2        a3        a4        a5        a6        a7
1 0.009110 -0.002122 -0.005770 -0.016751  0.003284 -0.082381        NA
2 0.018065 -0.033954 -0.033954  0.005826 -0.033918 -0.034069 -0.030281
3 0.058728  0.003693 -0.008006  0.035635  0.039816  0.040578        NA

更新

gsub("Q\\d{1,3}\\s+\\d{1,2}\\s+;","","Q300      29       ;")
[1] ""

  • Q\\d{1,3} Q后跟一个包含1-3个数字的数字,即Q1,Q12或Q123
  • \\s+将匹配1个或多个空格

    现在您可以更新

    df[,1] <- gsub("Q[1-9]        1     ;","",df[,1])
    

    通过

    df[,1] <- gsub("Q\\d{1,3}\\s+\\d{1,2}\\s+;","",df[,1])