按多个数字排序字符向量

时间:2018-04-05 10:04:44

标签: r sorting date vector

我有一个样本字符向量,文件名如下:

> vector
[1] "1 Janu 1998.txt"        "2 Feb. 1999.txt"   "3 Marc 1999.txt" 
[4] "2 February 1998.txt"    "3 March. 1998.txt" "1 Jan 1999.txt" 

我想按年份和月份(每个元素的第一个数字)对元素进行排序。所以我这样做:

> library(gtools)
> mixedsort(vector)
[1] "1 Janu 1998.txt"    "1 Jan 1999.txt"    "2 February 1998.txt"   
[4] "2 Feb. 1999.txt"    "3 Marc 1999.txt"   "3 March. 1998.txt"

如果我使用sort(vector),我会得到相同的输出。我一直在阅读几个问题,但我没有找到具体的答案。如果有人能帮助我,我将不胜感激。提前致谢。   我想得到以下输出:

> output
[1] "1 Janu 1998.txt"    "2 February 1998.txt"    "3 March. 1998.txt"       
[4] "1 Jan 1999.txt"     "2 Feb. 1999.txt"        "3 Marc 1999.txt"  

1 个答案:

答案 0 :(得分:2)

我们可以这样做:

v <- c("1 Jan 1998.txt", "2 Feb. 1999.txt", "3 March 1999.txt", "2 Feb 1998.txt", "3 March. 1998.txt","1 Jan 1999.txt")

v[order(as.Date(gsub("\\.", "", v), "%d %b %Ytxt"))];
#[1] "1 Jan 1998.txt"    "2 Feb 1998.txt"    "3 March. 1998.txt"
#[4] "1 Jan 1999.txt"    "2 Feb. 1999.txt"   "3 March 1999.txt"

说明:我们使用as.Date将向量v中的条目转换为日期;然后,order将按日,月,年正确排序日期。

请注意,向量v中的某些条目包含一个月后的句点;不确定这是不是偶然,但gsub命令会处理这些。

同样的目标:

v[order(as.Date(gsub("(\\.|\\.txt)", "", v), "%d %b %Y"))];

更新

为了解决月份名称的非标准缩写,我将定义一个自定义map,它将非标准名称/缩写链接起来。然后你可以做这样的事情:

v <- c("1 Janu 1998.txt", "2 Feb. 1999.txt", "3 Marc 1999.txt",
    "2 February 1998.txt", "3 March. 1998.txt", "1 Jan 1999.txt")

# Define a map to map non-standard to standard month abbrev
map <- c(
    Janu = "Jan",
    Marc = "March")

# Separate dmy from filename and store in matrix
mat <- sapply(gsub("(\\.|\\.txt)", "", v), function(x)
    unlist(strsplit(x, " ")))

# Replace non-standard month names
mat[2, ] <- ifelse(
    !is.na(match(mat[2, ], names(map))),
    map[match(mat[2, ], names(map))],
    mat[2, ])

# Convert to Date then to numeric
dmy <- as.numeric(apply(mat, 2, function(x)
    as.Date(paste0(x, collapse = "-"), format = "%d-%b-%Y")));

# Order according to dmy
v[order(dmy)]
#[1] "1 Janu 1998.txt"     "2 February 1998.txt" "3 March. 1998.txt"
#[4] "1 Jan 1999.txt"      "2 Feb. 1999.txt"     "3 Marc 1999.txt"