通过各种分隔符将Dataframe列分隔为更多列

时间:2018-01-29 14:12:09

标签: r data-manipulation separator

我有一个数据集,我试图提供一个使用下面的dput命令的示例。我遇到的问题是试图通过分隔符来分离数据。

    > dput(head(team_data))
    structure(list(X1 = 2:6, 
names2 = c("Andre Callender  Seton Hall Preparatory School (West Orange, NJ)", "Gosder Cherilus  Somerville (Somerville, MA)", "Justin Bell  Mount Vernon (Alexandria, VA)", "Tom Anevski  Elder (Cincinnati, OH)", "Brad Mueller  Mars Area (Mars, PA)"), 
pos2 = c("RB 5-10 185", "OT 6-7 270", "TE 6-3 250", "OT 6-5 265", "CB 6-0 170"), rating2 = c("0.8667 194 18 8", "0.8667 262 20 1", "0.8333 306 14 7", "0.8333 377 25 13", "0.8333 496 36 16"), 
status2 = c("Enrolled   6/30/2003", "Enrolled   6/30/2003", "Enrolled   6/30/2003", "Enrolled   6/30/2003", "Enrolled   6/30/2003"), team = c("Boston-College", "Boston-College", "Boston-College", "Boston-College", "Boston-College"), year = c(2003L, 2003L, 2003L, 2003L, 2003L)), 
.Names = c("X1", "names2", "pos2", "rating2", "status2", "team", "year"), row.names = c(NA, -5L), class = c("tbl_df", 
    "tbl", "data.frame"))

以下是我尝试在上述数据集上执行的代码。根据我的意思,以下两个函数可以正常工作。

library(rvest)
library(stringr)
library(tidyr)
library(readxl)
df2<-separate(data=team_data,col=pos2,into= c("Position","Height","Weight"),sep=" ")
df3<-separate(data=df2,col=rating2,into= c("Rating","National","Position","State Rank"),sep=" ")

但是我在尝试进一步分离数据帧的列时遇到了很大的麻烦。我尝试了各种方法(下面的示例),但下面的所有代码都会产生相同的错误,“错误:数据源必须是字典”。

df4<-separate(data=df3,col=names2,into= c("Name","Geo"),sep="(")
df4<-separate(data=df3,col=names2,into= c("Name","Geo"),sep='\\(|\\)')
df4<-separate(data=df3,col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ")
df4<-separate(data=df3,col=status2,into= c("Date_Enrollment","Enroll_Status"),sep="   ")

最终目标是将“names2”列分隔为“(”和“,”并删除“)”,这样我最终会得到3列数据。对于另一列(“status2”),目标是从注册日期中分离出“已注册”。

从我所看到的错误中我得到的错误表明我正在复制列名,但我无法弄清楚发生了什么。

1 个答案:

答案 0 :(得分:0)

您使用Position两次,一次在df2,一次在df3。这对我有用:

team_data %>%
  separate(col=pos2, into= c("Position","Height","Weight"), sep=" ") %>%
  separate(col=rating2,into= c("Rating","National","Position2","State Rank"),sep=" ")%>%
  separate(col=names2,into= c("Name","Geo"),sep="\\(")  %>%
  separate(col=status2,into= c("Date_Enrollment","Enroll_Status"),sep="   ")