我有一个具有以下名称的数据框:
R > colnames(crime)
[1] "http...purl.org.linked.data.sdmx.2009.dimension.refArea"
[2] "Reference.Area"
[3] "X1996.1997"
[4] "X1997.1998"
[5] "X1998.1999"
[6] "X1999.2000"
[7] "X2000.2001"
[8] "X2001.2002"
[9] "X2002.2003"
[10] "X2003.2004"
[11] "X2004.2005"
[12] "X2005.2006"
[13] "X2006.2007"
[14] "X2007.2008"
[15] "X2008.2009"
[16] "X2009.2010"
[17] "X2010.2011"
[18] "X2011.2012"
[19] "X2012.2013"
[20] "X2013.2014"
[21] "X2014.2015"
[22] "X2015.2016"
[23] "X2016.2017"
[24] "X2017.2018"
并且我还有以下专栏:
R > crime[,1]
[1] http://statistics.gov.scot/id/statistical-geography/S12000033
[2] http://statistics.gov.scot/id/statistical-geography/S12000034
[3] http://statistics.gov.scot/id/statistical-geography/S12000041
[4] http://statistics.gov.scot/id/statistical-geography/S12000035
[5] http://statistics.gov.scot/id/statistical-geography/S12000036
[6] http://statistics.gov.scot/id/statistical-geography/S12000005
[7] http://statistics.gov.scot/id/statistical-geography/S12000006
我想要的是第一列名称为“ refArea”,而年份为最后四个数字的列,即“ X1996.1997”为“ 1997”。我还希望第一列仅包含最后9个字符(例如S12000006-代码不同,有些是S02 ...或S01 ...)
我当前的代码如下:
R > colnames(crime)[colnames(crime) == "http...purl.org.linked.data.sdmx.2009.dimension.refArea"] <- "refArea" #replace url with "refArea"
crime$refArea <- substr(crime$refArea, 53, 61) #substring only characters 53-61 from column refArea
colnames(crime) <- c("refArea", "Reference.Area", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018") #Manually change column names
但是这感觉很笨拙且编码不正确(我必须对8个或9个以上的数据集重复此过程)-您将如何对此进行改进?
答案 0 :(得分:0)
可以选择sub
来捕获'refArea'并删除所有前面的字符。要从“ X1996.1997”中删除“ X1996”,我们可以使用substr
colnames(crime)[1] <- sub(".*\\.(refArea)", "\\1", colnames(crime)[1])
v1 <- colnames(crime)[3:ncol(crime)]
colnames(crime)[3:ncol(crime)] <- substr(v1, nchar(v1)-3, nchar(v1))