均匀使用字符串中的一位数字和两位数

时间:2018-06-29 17:25:35

标签: r string data.table data-management

我有一个非常大的data.table,其中(大量)项由包含文本和数字的字符串定义。

library(data.table)    
dd <- data.table(x = c("A4","A4","A4","A14","A14","A14","B4","B4","B4"),y = c("A4","A14","B4","A4","A14","B4","A4","A14","B4"), z = c(1,2,3,4,5,6,7,8,9))

x   y   z
A4  A4  1
A4  A14 2
A4  B4  3
A14 A4  4
A14 A14 5
A14 B4  6
B4  A4  7
B4  A14 8
B4  B4  9

数字可以是一位或两位数字,因此R将始终根据数字的第一位(A14之前的A14)对其进行排序。 Mixedsort可以解决这个问题。但是,当我将长数据重塑为宽

wide <- dcast(dd, x ~ y, value.var = "z")

R正在根据基本排序规则再次应用排序。

x    A14  A4  B4
A14  5    4   6
A4   2    1   3
B4   8    7   9

但是,我需要用于以下矩阵计算的原始顺序。有什么有效的方法可以将字符串+一位数字重命名为字符串+两位数字(A4-> A04)还是我错过的另一种方法?

5 个答案:

答案 0 :(得分:5)

另一种,也许也是最简单的选择是使用mixedorder软件包中的gtools

wide <- dcast(dd, x ~ y, value.var = "z")[gtools::mixedorder(x)]

给出:

> wide
     x A14 A4 B4
1:  A4   2  1  3
2: A14   5  4  6
3:  B4   8  7  9

如果您还希望以相同的方式获取列顺序集,则可以另外使用setcolorder

setcolorder(wide, c(1, gtools::mixedorder(names(wide)[-1]) + 1))

然后给出:

> wide
     x A4 A14 B4
1:  A4  1   2  3
2: A14  4   5  6
3:  B4  7   8  9

答案 1 :(得分:2)

您可以使用sprintf()将数字预填充为0

sprintf("%s%02.0d", "A",  1:20)
# [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11" "A12" "A13" "A14" "A15" "A16" "A17" "A18" "A19" "A20"

答案 2 :(得分:2)

您可以通过以下方式将unable to parse requirement: invalid label key "&LabelSelector{MatchLabels:map[string]string{version:": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]') 添加到数据中

0

或者,如果您需要应用于更多列:

dd[nchar(x) == 2, x := paste0(substr(x, 1, 1), 0, substr(x, 2, 2))]
dd[nchar(y) == 2, y := paste0(substr(y, 1, 1), 0, substr(y, 2, 2))]

#      x   y z
# 1: A04 A04 1
# 2: A04 A14 2
# 3: A04 B04 3
# 4: A14 A04 4
# 5: A14 A14 5
# 6: A14 B04 6
# 7: B04 A04 7
# 8: B04 A14 8
# 9: B04 B04 9

答案 3 :(得分:2)

此解决方案不需要其他零。

# Data frame
df <- data.frame(x = c("A4","A4","A4","A14","A14","A14","B4","B4","B4"),
                 y = c("A4","A14","B4","A4","A14","B4","A4","A14","B4"), 
                 z = c(1,2,3,4,5,6,7,8,9),
                 stringsAsFactors = FALSE)

# Reorder columns and rows using `mixedsort`. 
wide <- dcast(df, x ~ y,value.var   = "z") %>% 
  select(x, mixedsort(unique(df$x))) %>% 
  slice(match(x, mixedsort(unique(df$x))))

给予

#     x A4 A14 B4
# 1  A4  1   2  3
# 2 A14  4   5  6
# 3  B4  7   8  9

答案 4 :(得分:1)

您可能希望考虑通过因素直接在数据中实现此顺序,因此您不必稍后再进行数据整理即可解决此问题。

如果您已经将这些唯一值排序在某个地方,则不需要mixedorder而不是mixedsort,只需将它们转换为因子即可。

否则,您可以取回订单:

library(gtools)
dd[,1:2] <- lapply(dd[,1:2],function(x) factor(x, mixedsort(unique(x))))

然后正常进行:

dcast(dd, x ~ y, value.var = "z")
#      x A4 A14 B4
# 1:  A4  1   2  3
# 2: A14  4   5  6
# 3:  B4  7   8  9