将列数据框拆分为多列

时间:2021-02-22 22:33:10

标签: r

我有一个数据框,其中一列看起来像这样

df = read.table(file="sprint.m.df.txt", sep="\t", quote="", header=TRUE)

    X.Rank...Time...Wind...Name...Country...Birthdate...City...Date.
1     1 9.58 0.9 "Usain Bolt" "JAM" "21.08.86" "Berlin" "16.08.2009"
2     2 9.63 1.5 "Usain Bolt" "JAM" "21.08.86" "London" "05.08.2012"
3      3 9.69 0 "Usain Bolt" "JAM" "21.08.86" "Beijing" "16.08.2008"
4      3 9.69 2 "Tyson Gay" "USA" "09.08.82" "Shanghai" "20.09.2009"
5 3 9.69 -0.1 "Yohan Blake" "JAM" "26.12.89" "Lausanne" "23.08.2012"
6      6 9.71 0.9 "Tyson Gay" "USA" "09.08.82" "Berlin" "16.08.2009"

我一直在尝试使用字符串拆分和其他方法将列拆分为多列,但没有任何效果。

如何拆分数据框,以便最终得到一个数据框

 X.rank | Time | wind | name       | country | birthdate| city    | date
    1      9.58  0.9    Usian Bolt    jam       21.08.86  Berlin    16.08.2009

2 个答案:

答案 0 :(得分:2)

您可以使用 tibble 包创建一个 tribble

library(tibble)

df <- tribble(
~X, ~RankTime, ~Wind, ~Name, ~Country, ~Birthdate, ~City, ~Date, 
1, 9.58, 0.9, "Usain Bolt", "JAM", "21.08.86", "Berlin", "16.08.2009",
2, 9.63, 1.5, "Usain Bolt", "JAM", "21.08.86", "London", "05.08.2012",
3, 9.69, 0, "Usain Bolt", "JAM", "21.08.86", "Beijing", "16.08.2008",
3, 9.69, 2, "Tyson Gay", "USA", "09.08.82", "Shanghai", "20.09.2009",
3, 9.69, -0.1, "Yohan Blake", "JAM", "26.12.89", "Lausanne", "23.08.2012",
6, 9.71, 0.9, "Tyson Gay", "USA", "09.08.82", "Berlin", "16.08.2009")

df

# output
# A tibble: 6 x 8
      X RankTime  Wind Name        Country Birthdate City     Date      
  <dbl>    <dbl> <dbl> <chr>       <chr>   <chr>     <chr>    <chr>     
1     1     9.58   0.9 Usain Bolt  JAM     21.08.86  Berlin   16.08.2009
2     2     9.63   1.5 Usain Bolt  JAM     21.08.86  London   05.08.2012
3     3     9.69   0   Usain Bolt  JAM     21.08.86  Beijing  16.08.2008
4     3     9.69   2   Tyson Gay   USA     09.08.82  Shanghai 20.09.2009
5     3     9.69  -0.1 Yohan Blake JAM     26.12.89  Lausanne 23.08.2012
6     6     9.71   0.9 Tyson Gay   USA     09.08.82  Berlin   16.08.2009

答案 1 :(得分:0)

引号内的空格使该列难以解析,但很容易阅读。请参阅我上面的评论并使用 read.table(file="sprint.m.df.txt", sep=" "),或者如果您确实必须使用您的 df,请尝试使用 read_delimscan

df8 <- readr::read_delim(df[,1], delim=" ", col_names =FALSE)
# OR
df8 <- data.frame(matrix(scan(text=df[,1], what=" "), ncol=8, byrow=TRUE))
colnames(df8) <- c("rank", "Time", "wind", "name", "country", "birthdate", "city", "date")
df8
  rank Time wind        name country birthdate     city       date
1    1 9.58  0.9  Usain Bolt     JAM  21.08.86   Berlin 16.08.2009
2    2 9.63  1.5  Usain Bolt     JAM  21.08.86   London 05.08.2012
3    3 9.69    0  Usain Bolt     JAM  21.08.86  Beijing 16.08.2008
4    3 9.69    2   Tyson Gay     USA  09.08.82 Shanghai 20.09.2009
5    3 9.69 -0.1 Yohan Blake     JAM  26.12.89 Lausanne 23.08.2012
6    6 9.71  0.9   Tyson Gay     USA  09.08.82   Berlin 16.08.2009