如何根据R

时间:2016-07-29 05:00:24

标签: r

如果您运行这些代码,我有一些像这个例子的代码

library(hurricaneexposure)
library(hurricaneexposuredata)
data("hurr_tracks")
storms <- unique(hurr_tracks$storm_id)
storms

然后你会看到“风暴”有一个带有“stormname-year”结构的长字符串列表。

[1] "Alberto-1988"   "Beryl-1988"     "Chris-1988"     "Florence-1988"  "Gilbert-1988"   "Keith-1988"     "Allison-1989"   "Chantal-1989"  
[9] "Hugo-1989"      "Jerry-1989"     "Bertha-1990"    "Marco-1990"     "Ana-1991"       "Bob-1991"       "Fabian-1991"    "Notnamed-1991" 
[17] "Andrew-1992"    "Danielle-1992"  "Earl-1992"      "Arlene-1993"    "Emily-1993"     "Alberto-1994"   "Beryl-1994"     "Gordon-1994"   
[25] "Allison-1995"   "Dean-1995"      "Erin-1995"      "Gabrielle-1995" "Jerry-1995"     "Opal-1995"      "Arthur-1996"    "Bertha-1996"   
[33] "Edouard-1996"   "Fran-1996"      "Josephine-1996" "Subtrop-1997"   "Ana-1997"       "Danny-1997"     "Bonnie-1998"    "Charley-1998"  
[41] "Earl-1998"      "Frances-1998"   "Georges-1998"   "Hermine-1998"   "Mitch-1998"     "Bret-1999"      "Dennis-1999"    "Floyd-1999"    
[49] "Harvey-1999"    "Irene-1999"     "Beryl-2000"     "Gordon-2000"    "Helene-2000"    "Leslie-2000"    "Allison-2001"   "Barry-2001"     

我的问题是如何根据同年拆分这些元素。例如,我想创建一个新的变量“y1988”,它是1998年所有风暴的列表。如果我运行y1988,它将输出:

y1988
[1] "Alberto-1988"   "Beryl-1988"     "Chris-1988"     "Florence-1988"  "Gilbert-1988"   "Keith-1988"

至于y1989直到2001年。我猜它可能会使用gsub()和for循环,但是,我是R的新秀,所以真的希望你能给我一些建议。

3 个答案:

答案 0 :(得分:1)

我们可以将split与通过删除前缀子字符串创建的分组变量一起使用,包括-sub

lst <- split(storms, sub(".*-", "", storms))
lst$`1988`
#[1] "Alberto-1988"  "Beryl-1988"    "Chris-1988"    "Florence-1988"
#[5] "Gilbert-1988"  "Keith-1988"   

数据

storms <- c("Alberto-1988", "Beryl-1988", "Chris-1988", "Florence-1988", 
 "Gilbert-1988", "Keith-1988", "Allison-1989", "Chantal-1989", 
 "Hugo-1989", "Jerry-1989", "Bertha-1990", "Marco-1990", "Ana-1991", 
 "Bob-1991", "Fabian-1991", "Notnamed-1991", "Andrew-1992", "Danielle-1992", 
 "Earl-1992", "Arlene-1993", "Emily-1993", "Alberto-1994", "Beryl-1994", 
 "Gordon-1994", "Allison-1995", "Dean-1995", "Erin-1995", "Gabrielle-1995", 
 "Jerry-1995", "Opal-1995", "Arthur-1996", "Bertha-1996", "Edouard-1996", 
 "Fran-1996", "Josephine-1996", "Subtrop-1997", "Ana-1997", "Danny-1997", 
 "Bonnie-1998", "Charley-1998", "Earl-1998", "Frances-1998", "Georges-1998", 
 "Hermine-1998", "Mitch-1998", "Bret-1999", "Dennis-1999", "Floyd-1999", 
 "Harvey-1999", "Irene-1999", "Beryl-2000", "Gordon-2000", "Helene-2000", 
 "Leslie-2000", "Allison-2001", "Barry-2001")

答案 1 :(得分:0)

为什么不直接在原始数据框中提取年份?图书馆dplyrtidyr非常适合这样的问题。 我建议如下:

library(dplyr)
library(tidyr)
hurr_tracks %>%
    extract(storm_id, c("storm", "year"),"(.+)-(.+)")

答案 2 :(得分:0)

使用stringr的替代方法

  

分裂(暴风雨,str_extract(暴风雨,&#34; [0-9] +&#34))