如何通过|拆分列分成多列

时间:2013-12-05 02:10:22

标签: regex r dataframe strsplit

在R: 我有一个包含许多行但只有一列的数据框。每行都有一长串字符,周期性地用|来标点标记。我想在每次有|时分割字符标记,以便有很多列。

1995-01-01|33.399999999999999|40.299999999999997|35.399999999999999|35.0|37.200000000000003|23.399999999999999|23.199999999999999|47.399999999999999|49.200000000000003|49.200000000000003|48.100000000000001|42.299999999999997|58.200000000000003|17.399999999999999|50.700000000000003|5.2999999999999998|20.600000000000001|38.5|43.299999999999997 etc.

每个字符串以日期开头,然后包含与城市对应的数字。变量名也列为一个字符串,它们需要用“。”分隔。标记

date.abilene_tx.akron_oh.albany_ny.albuquerque_nm.allentown_pa.amarillo_tx.anchorage_ak.asheville_nc.atlanta_ga etc.

非常感谢任何帮助!

2 个答案:

答案 0 :(得分:1)

这是一个data.frame,其中包含一列和10行,可能与您的相似:

dat <- "1995-01-01|33.399999999999999|40.299999999999997|35.399999999999999|35.0|37.200000000000003|23.399999999999999|23.199999999999999|47.399999999999999|49.200000000000003|49.200000000000003|48.100000000000001|42.299999999999997|58.200000000000003|17.399999999999999|50.700000000000003|5.2999999999999998|20.600000000000001|38.5|43.299999999999997 "

df <- data.frame(col1 = rep(dat, 10))

这里的data.frame包含基于拆分Col1的新列:

foo <- data.frame(do.call('rbind', strsplit(as.character(df$col1),'|',fixed=TRUE)))
foo

           X1                 X2                 X3                 X4   X5                 X6
1  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
2  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
3  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
4  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
5  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
6  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
7  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
8  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
9  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
10 1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
                   X7                 X8                 X9                X10                X11
1  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
2  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
3  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
4  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
5  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
6  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
7  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
8  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
9  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
10 23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
                  X12                X13                X14                X15                X16
1  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
2  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
3  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
4  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
5  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
6  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
7  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
8  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
9  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
10 48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
                  X17                X18  X19                 X20
1  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
2  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
3  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
4  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
5  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
6  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
7  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
8  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
9  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
10 5.2999999999999998 20.600000000000001 38.5 43.299999999999997

答案 1 :(得分:1)

您应该使用以下命令从文件加载数据:

 dat <- read.table(filename, sep="|")

这将处理以“|”分隔的行但是你说“字符串”用“。”分隔,所以如果它们以某种方式混合在htat文本文件中,你可能需要先用readLines()输入一些预处理。