如何通过数字排序对整数字符串进行排序?

时间:2017-04-11 15:47:06

标签: r regex sorting

我有一个data.table列中的一列,格式为字符串+整数。 e.g。

string1, string2, string3, string4, string5,

当我使用sort()时,我把这些字符串放错了。

string1, string10, string11, string12, string13, ..., string2, string20, 
string21, string22, string23, ....

我如何按顺序对这些进行排序

string01, string02, string03, string04, strin0g5, ... , string10,, string11, 
string12, etc.   

一种方法可能是为每个整数0<10添加1-9? 我怀疑你会用str_extract(dt$string_column, "[a-z]+")提取字符串,然后为每个单位数整数添加一个0 ...不知怎的sprintf()

4 个答案:

答案 0 :(得分:6)

我们可以删除不是数字的字符来执行sort ing

dt[order(as.integer(gsub("\\D+", "", col1)))]

答案 1 :(得分:1)

您可以使用str_extract stringr个包来获取数字,order根据

x = c("string1","string3","stringZ","string2","stringX","string10")
library(stringr)
c(x[grepl("\\d+",x)][order(as.integer(str_extract(x[grepl("\\d+",x)],"\\d+")))], 
   sort(x[!grepl("\\d+",x)]))
#[1] "string1"  "string2"  "string3"  "string10" "stringX"  "stringZ" 

答案 2 :(得分:1)

您可以在mixedsort中找到gtools

vec <- c("string1", "string10", "string11", "string12", "string13","string2", 
         "string20", "string21", "string22", "string23")

library(gtools)
mixedsort(vec)

#[1] "string1"  "string2"  "string10" "string11" "string12" "string13"
# "string20" "string21" "string22" "string23"

答案 3 :(得分:1)

假设字符串如下所示:

# Read patterns into an associative array
# Requites Bash 4 or later
declare -A patterns

while IFS='|' read key value
do
    patterns[$key]="$value"

done < pattern.txt

# Set the option for case insensitive patterns
shopt -s nocasematch

# IFS is set here because localised setting for 'echo' does not work in bash
oldIFS="$IFS"
IFS='|'

# "line" is an array
while read -a line
do
    # Check there are at least 15 fields
    if (( ${#line[@]} >= 15 ))
    then
        # Iterate through the patterns array
        for key in "${!patterns[@]}"
        do
            # We are only interested in the 10th and 15th fields
            # (index 9 and 14 since arrays index from zero)
            val="${line[9]}"
            line[9]="${val//$key/${patterns[$key]}}"
            val="${line[14]}"
            line[14]="${val//$key/${patterns[$key]}}"
        done
    fi
    echo "${line[*]}"

done < master.txt

IFS="$oldIFS"

<强>输出:

library(data.table)
library(stringr)

  xstring <- data.table(x = c("string1","string11","string2",'string10',"stringx"))
  extracts <- str_extract(xstring$x,"(?<=string)(\\d*)")
  y_string <- ifelse(nchar(extracts)==2 | extracts=="",extracts,paste0("0",extracts))
  fin_string <- str_replace(xstring$x,"(?<=string)(\\d*)",y_string)
  sort(fin_string)