一个数据帧(df)具有多行和一列。我应该将一列转换为多列,然后将1:删除为N:。
head(df[1:3,])
[1] Q1 1 1: 0.009110 2:-0.002122 3:-0.005770 4:-0.016751 5: 0.003284 6:-0.082381
[2] Q2 1 1: 0.018065 2:-0.033954 3:-0.033954 4: 0.005826 5:-0.033918 6:-0.034069 7:-0.030281
[3] Q3 1 1: 0.058728 2: 0.003693 3:-0.008006 4: 0.035635 5: 0.039816 6: 0.040578
20 Levels: Q1 1 1: 0.009110 2:-0.002122 3:-0.005770 4:-0.016751 5: 0.003284 6:-0.082381 ...
df<-read.csv("effect.txt",header = F,skip = 1)
df2 <- lapply(df, gsub, pattern="1:", replacement= "")
答案 0 :(得分:0)
这有很长的路要走,但是行得通。
#Read the data set
df <- read.table(text = "
'Q1 1 1: 0.009110 2:-0.002122 3:-0.005770 4:-0.016751 5: 0.003284 6:-0.082381'
'Q2 1 1: 0.018065 2:-0.033954 3:-0.033954 4: 0.005826 5:-0.033918 6:-0.034069 7:-0.030281'
'Q3 1 1: 0.058728 2: 0.003693 3:-0.008006 4: 0.035635 5: 0.039816 6: 0.040578 '
",header=F)
library(tidyr)
df[,1] <- gsub("[1-9]:",";",df[,1]) #replace any one digit number i.e. [1-9] followed by ':' with ';'
df[,1] <- gsub("Q[1-9] 1 ;","",df[,1]) #replace any Q with one digit number then space one digit number then space then ';' e.g. "Q1 1 ;", "Q2 1 ;", "Q3 1 ;", ... etc with ""
max.length <- max(sapply(strsplit(df[,1],";"),length)) #find the length of each row to predifenied the number of columns required by `separate`
df_clean <- separate(df,1, paste0("a",1:max.length),sep = ";",fill = "right")
df_clean %>% mutate_if(is.character,as.numeric) #change all character columns to numeric
a1 a2 a3 a4 a5 a6 a7
1 0.009110 -0.002122 -0.005770 -0.016751 0.003284 -0.082381 NA
2 0.018065 -0.033954 -0.033954 0.005826 -0.033918 -0.034069 -0.030281
3 0.058728 0.003693 -0.008006 0.035635 0.039816 0.040578 NA
gsub("Q\\d{1,3}\\s+\\d{1,2}\\s+;","","Q300 29 ;")
[1] ""
Q\\d{1,3}
Q后跟一个包含1-3个数字的数字,即Q1,Q12或Q123
\\s+
将匹配1个或多个空格
现在您可以更新
df[,1] <- gsub("Q[1-9] 1 ;","",df[,1])
通过
df[,1] <- gsub("Q\\d{1,3}\\s+\\d{1,2}\\s+;","",df[,1])