我有一个带有间隔的列的df:
1-25
26-50
51-100
100-200
超过200
当我尝试在R中按升序排序时,看起来像
1-25
100-200
26-50
51-100
超过200
它根据第一个数字排序。我该如何解决?
答案 0 :(得分:0)
假设我们有一个包含这些级别的数据框,但顺序混乱:
df <- data.frame(stringsAsFactors = F,
intervals = c("26-50", "More than 200", "51-100",
"1-25", "100-200"))
df
# intervals
#1 26-50
#2 More than 200
#3 51-100
#4 1-25
#5 100-200
我们可能会添加一个帮助列进行排序:
df$num = readr::parse_number(df$intervals)
df[order(df$num),]
# intervals num
#4 1-25 1
#1 26-50 26
#3 51-100 51
#5 100-200 100
#2 More than 200 200
或者我们可以将间隔设为factor
,这样除字母顺序外,它还将具有内置顺序:
df$intervals_f <- factor(df$intervals, levels = c("1-25", "26-50",
"51-100", "100-200", "More than 200"))
df[order(df$intervals_f),]
# intervals num intervals_f
#4 1-25 1 1-25
#1 26-50 26 26-50
#3 51-100 51 51-100
#5 100-200 100 100-200
#2 More than 200 200 More than 200
答案 1 :(得分:0)
摆脱单词/范围/空格/强制数字化/重新排序/存储为名称“ intervals”的新向量,并将其打入数据框:
df <- data.frame(intervals = df[order(as.numeric(trimws(gsub("[-].*|[a-zA-Z]+", "", df$intervals)))),], stringsAsFactors = F)
数据:
df <- data.frame(stringsAsFactors = F,
intervals = c("26-50", "More than 200", "51-100",
"1-25", "100-200"))