我有一组文件名,如:
filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")
我想根据“ - ”之后的数字过滤它们。
例如,在python中,我可以使用排序函数的key
参数:
filelist <- ["filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt"]
sorted(filelist, key=lambda(x): int(x.split("-")[1].split(".")[0]))
> ["filec-1.txt", "fileb-2.txt", "filef-4.txt", "filed-5.txt", "filea-10.txt"]
在R中,到目前为止,我正在玩strsplit
和lapply
而没有运气。
在R中执行此操作的方法是什么?
修改: 文件名可以是很多东西,可能包含更多数字。唯一固定的模式是我想要排序的数字是在“ - ”之后。另一个(真实的)例子:
c <- ("boards10017-51.mp4", "boards10065-66.mp4", "boards10071-81.mp4",
"boards10185-91.mp4", "boards10212-63.mp4", "boards1025-51.mp4",
"boards1026-71.mp4", "boards10309-89.mp4", "boards10310-68.mp4",
"boards10384-50.mp4", "boards10398-77.mp4", "boards10419-119.mp4",
"boards10421-85.mp4", "boards10444-87.mp4", "boards10451-60.mp4",
"boards10461-81.mp4", "boards10463-52.mp4", "boards10538-83.mp4",
"boards10575-62.mp4", "boards10577-249.mp4")"
答案 0 :(得分:8)
我不确定文件名列表的实际复杂程度,但以下内容可能就足够了:
filelist[order(as.numeric(gsub("[^0-9]+", "", filelist)))]
# [1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"
考虑到您的修改,您可能希望将gsub
更改为:
gsub(".*-|\\..*", "", filelist)
同样,如果没有更多文字案例,很难说这是否足以满足您的需求。
示例:
x <- c("boards10017-51.mp4", "boards10065-66.mp4", "boards10071-81.mp4",
"boards10185-91.mp4", "boards10212-63.mp4", "boards1025-51.mp4",
"boards1026-71.mp4", "boards10309-89.mp4", "boards10310-68.mp4",
"boards10384-50.mp4", "boards10398-77.mp4", "boards10419-119.mp4",
"boards10421-85.mp4", "boards10444-87.mp4", "boards10451-60.mp4",
"boards10461-81.mp4", "boards10463-52.mp4", "boards10538-83.mp4",
"boards10575-62.mp4", "boards10577-249.mp4")
x[order(as.numeric(gsub(".*-|\\..*", "", x)))]
## [1] "boards10384-50.mp4" "boards10017-51.mp4" "boards1025-51.mp4"
## [4] "boards10463-52.mp4" "boards10451-60.mp4" "boards10575-62.mp4"
## [7] "boards10212-63.mp4" "boards10065-66.mp4" "boards10310-68.mp4"
## [10] "boards1026-71.mp4" "boards10398-77.mp4" "boards10071-81.mp4"
## [13] "boards10461-81.mp4" "boards10538-83.mp4" "boards10421-85.mp4"
## [16] "boards10444-87.mp4" "boards10309-89.mp4" "boards10185-91.mp4"
## [19] "boards10419-119.mp4" "boards10577-249.mp4"
答案 1 :(得分:0)
我做了一个regEx排序功能:
<强>功能强>
filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")
数据:强>
reg_sort(filelist,"\\d+")
#[1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"
通话功能
reg_sort(filelist,-"\\d+")
其他功能包括:
降序排序:#[1] "filea-10.txt" "filed-5.txt" "filef-4.txt" "fileb-2.txt" "filec-1.txt"
reg_sort(filelist,-"\\d+","\\w")
多层排序:reg_sort(filelist,"\\d+",verbose=T)
(对此示例数据没有意义)
详细模式:$\\d+
(请参阅/检查regEx模式提取的内容以便排序)
[1] 1 2 4 5 10
[1] "filec-1.txt" "fileb-2.txt" "filef-4.txt" "filed-5.txt" "filea-10.txt"
$matches[0]