如何使用hive功能将此数据T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0拆分为两列 例如
T 32
P 1
A 420
H 60
R 0.30841494477846165
S 0
答案 0 :(得分:2)
您可以使用正则表达式实现:
def main(args: Array[String]) {
val s = "T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0"
val pattern = "[A-Z]\\_\\d+\\.?\\d*"
var buff = new String()
val r = Pattern.compile(pattern)
val m = r.matcher(s)
while (m.find()) {
buff = buff + (m.group(0))
buff = buff + "\n"
}
buff = buff.toString.replaceAll("\\_", " ")
println("output:\n" + buff)
}
<强>输出:强>
output:
T 32
P 1
A 420
H 60
R 0.30841494477846165
S 0
答案 1 :(得分:2)
如果你需要收集数据以便进一步处理,和你保证它总是正确配对,你可以这样做。
scala> val str = "T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0"
str: String = T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0
scala> val data = str.split("_").sliding(2,2)
data: Iterator[Array[String]] = non-empty iterator
scala> data.toList // just to see it
res29: List[Array[String]] = List(Array(T, 32), Array(P, 1), Array(A, 420), Array(H, 60), Array(R, 0.30841494477846165), Array(S, 0))
答案 2 :(得分:1)
您可以拆分字符串,获取数组,zipWithIndex并根据索引过滤以获取两个数组col1和col2,然后将其用于打印:
val str = "T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0"
val tmp = str.split('_').zipWithIndex
val col1 = tmp.filter( p => p._2 % 2 == 0 ).map( p => p._1)
val col2 = tmp.filter( p => p._2 % 2 != 0 ).map( p => p._1)
//col1: Array[String] = Array(T, P, A, H, R, S)
//col2: Array[String] = Array(32, 1, 420, 60, ...