使用R

时间:2019-11-28 10:14:21

标签: r sorting

我想对一个称为c的字符向量进行排序:

c<-c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018"

如果我使用R的内置函数sort,这就是我得到的:

> sort(c)
[1] "AD 2017"  "AD 2018 " "BL 2017"  "BL 2018"  "CT 2018"  "RT 2017"

但是,假设我有一个不同的值排序系统,该系统保存在矩阵中,如下所示:

  ORDER VALUE
1     1    RT
2     2    BL
3     3    AD
4     4    CT

问题是我如何才能对我的“ c”向量进行排序,以便它使用矩阵中的阶数,同时还要考虑到不同的年份;我的“自定义”排序向量应如下所示:

> special_sort(c)
[1] "RT 2017" , "BL 2017" , "BL 2018", "AD 2017" , "AD 2018 " , "CT 2018"

由于数据库很大,我真的需要找到一种自动化的方法。

预先感谢您的帮助

4 个答案:

答案 0 :(得分:2)

您可以尝试以下操作:

# order it by the first two characters, using the levels of factor choosen
v[order(factor(substr(v,1,2),levels = c("RT","BL","AD","CT")))]
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018"

所以有一个矩阵:

# use the second column of the matrix in unique(), to order
v[order(factor(substr(v,1,2),levels = unique(mat[,2])))]
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018" 

带有向量和矩阵:

# your vector
v<-c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")

# your matrix
mat <- structure(c("1", "2", "3", "4", "RT", "BL", "AD", "CT"), .Dim = c(4L, 
2L), .Dimnames = list(c("1", "2", "3", "4"), c("ORDER", "VALUE"
)))

答案 1 :(得分:2)

另一个选择可能是:

from plotly.subplots import make_subplots
import plotly.graph_objects as go
df=pd.DataFrame({'A':[1,2,3,20,30,40],'B':['Tita','Tita','Tita','Burru','Burru','Burru'],'Z':[1,2,3,1,2,3]})
fig = make_subplots(rows=1, cols=2)
lista_syst=df.B.unique()
i=0
for sist in lista_syst:
    i=i+1
    fig.add_trace(
            go.Scatter(x=df.loc[df['B'] == sist, 'A'],y=df.loc[df['B'] == sist, 'Z']),
            row=1,col=i
        )
fig.update_layout(height=600, width=800, title_text="Subplots")
fig.show()

样本数据:

x[order(match(substr(x, 1, 2), df$VALUE))]

[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018" 

答案 2 :(得分:1)

数据

vector <- c("AD 2017", "AD 2018 ","RT 2017","BL 2017","BL 2018","CT 2018")

订购功能

order_fun <- function(vector) {
  df <- data.frame(do.call(rbind, strsplit(vector, " ")))
  df$X1 <- factor(df$X1, levels = c("RT", "BL", "AD", "CT"), labels = c("RT", "BL", "AD", "CT"))
  df <- df[order(df$X1, df$X2), ]
  vector_ordered <- vector[as.numeric(row.names(df))]
  return(vector_ordered)
}

订购一班轮

vector[order(factor(substr(vector,1,2), levels = c("RT", "BL", "AD", "CT")), substr(vector,4,7))]

结果

order_fun(vector)
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017"  "AD 2018 " "CT 2018" 

答案 3 :(得分:1)

不确定遇到相同的字母前缀时是否也考虑了year。如果有,那么以下内容可以为您提供帮助

res <- c[order(
  match(gsub("([[:alpha:]]+).*","\\1",v),df$VALUE),
  sort(as.numeric(gsub(".*?([[:digit:]]+)","\\1",v))))]

给出

> res
[1] "RT 2017"  "BL 2017"  "BL 2018"  "AD 2017" 
[5] "AD 2018 " "CT 2018" 

否则,如果只关心c[order(match(gsub("([[:alpha:]]+).*","\\1",v),df$VALUE))]的顺序,df$VALUE就足够使用了

数据

df <- structure(list(ORDER = 1:4, VALUE = c("RT", "BL", "AD", "CT")), class = "data.frame", row.names = c("1", 
"2", "3", "4"))