在scala中拆分字符串数组

时间:2015-03-04 06:04:59

标签: regex scala

我有一个字符串数组

Array[String] = Array(Monthend_Date Stf_Ttl Staffno Name    Surname Full_Name   MGr_Staffno Manager_Name    Cluster Consolidate Level3  Division    Region  Area    Branch  BranchID    COGNOS_unit Job_Family  Staff_Category  PosID   Position    PattersonGrade  Age Gender  Race    Disabled___Not_Disabled DTI_Race    DTI_EE_level    Staff_count FTE_HeadcountPrfl_Hay   Prfl_Hay_Ptrsn_Grd  Office  Hrc_Stf_No  Stf_Grd Stf_Term_Dt Term_Rsn_Desc   Prfl_Job_Desc   Br_Brnd Br_Cl_Id    Hr_Br_Ind   Txn_Tp_Desc To_Txn_Tp_Desc  Stf_Bnd_ID  Pos_Lvl Pos_Vrtl_Ind    Job_Fnctn_Desc  Job_Fnctn_ID    Grd_Code    Grd_Desc    Grd_Pay_Grp Qlfn_Desc   Qlfn_Desc_Other Prfl_Crr_Desc   Prfl_Crr_ID Core / Support  Rem_Srvy_ID Rem_Srvy_Desc   Gwhc_Mo_End_Dt  LastName    Level1Stf_No    Level1Name  Level2Stf_No    Level2Name  Level3Stf_No    Level3Name  Level4Stf_No    Level4Name  Level5Stf_No    Level5...

这里的分隔符是“tab”。当我使用“\ t”作为分隔符将其拆分时,它可以正常工作。

Array[String] = Array(Monthend_Date, Stf_Ttl, Staffno, Name, Surname, Full_Name, MGr_Staffno, Manager_Name, Cluster, Consolidate, Level3, Division, Region, Area, Branch, BranchID, COGNOS_unit, Job_Family, Staff_Category, PosID, Position, PattersonGrade, Age, Gender, Race, Disabled___Not_Disabled, DTI_Race, DTI_EE_level, Staff_count, FTE_Headcount, Prfl_Hay, Prfl_Hay_Ptrsn_Grd, Office, Hrc_Stf_No, Stf_Grd, Stf_Term_Dt, Term_Rsn_Desc, Prfl_Job_Desc, Br_Brnd, Br_Cl_Id, Hr_Br_Ind, Txn_Tp_Desc, To_Txn_Tp_Desc, Stf_Bnd_ID, Pos_Lvl, Pos_Vrtl_Ind, Job_Fnctn_Desc, Job_Fnctn_ID, Grd_Code, Grd_Desc, Grd_Pay_Grp, Qlfn_Desc, Qlfn_Desc_Other, Prfl_Crr_Desc, Prfl_Crr_ID, Core / Support, Rem_Srvy_ID, Rem_Srvy_Desc, Gwhc_Mo_End_Dt, LastName, Level1Stf_No, Level1Name, Level2Stf_No, Level2Na...

但是使用“|”拆分它管道作为分隔符我得到以下结果。

Array("", M, o, n, t, h, e, n, d, _, D, a, t, e, "  ", S, t, f, _, T, t, l, "   ", S, t, a, f, f, n, o, "   ", N, a, m, e, "    ", S, u, r, n, a, m, e, "   ", F, u, l, l, _, N, a, m, e, " ", M, G, r, _, S, t, a, f, f, n, o, "   ", M, a, n, a, g, e, r, _, N, a, m, e, "    ", C, l, u, s, t, e, r, "   ", C, o, n, s, o, l, i, d, a, t, e, "   ", L, e, v, e, l, 3, "  ", D, i, v, i, s, i, o, n, "    ", R, e, g, i, o, n, "  ", A, r, e, a, "    ", B, r, a, n, c, h, "  ", B, r, a, n, c, h, I, D, "    ", C, O, G, N, O, S, _, u, n, i, t, "   ", J, o, b, _, F, a, m, i, l, y, "  ", S, t, a, f, f, _, C, a, t, e, g, o, r, y, "  ", P, o, s, I, D, " ", P, o, s, i, t, i, o, n, "    ", P, a, t, t, e, r, s, o, n, G, r, a, d, e, "  ", A, g, e, "   ", G, e, n, d, e, r, "  ", R, a, c, e, "    ", D, i, s, a, b, l, e, d, _, _,...

为什么上面的每个角色被拆分?字符串数组中没有管道分隔符。

实现这一目标的正确方法是什么?

用“,”逗号分割,给出以下输出。

Array(Monthend_Date Stf_Ttl Staffno Name    Surname Full_Name   MGr_Staffno Manager_Name    Cluster Consolidate Level3  Division    Region  Area    Branch  BranchID    COGNOS_unit Job_Family  Staff_Category  PosID   Position    PattersonGrade  Age Gender  Race    Disabled___Not_Disabled DTI_Race    DTI_EE_level    Staff_count FTE_Headcount   Prfl_Hay    Prfl_Hay_Ptrsn_Grd  Office  Hrc_Stf_No  Stf_Grd Stf_Term_Dt Term_Rsn_Desc   Prfl_Job_Desc   Br_Brnd Br_Cl_Id    Hr_Br_Ind   Txn_Tp_Desc To_Txn_Tp_Desc  Stf_Bnd_ID  Pos_Lvl Pos_Vrtl_Ind    Job_Fnctn_Desc  Job_Fnctn_ID    Grd_Code    Grd_Desc    Grd_Pay_Grp Qlfn_Desc   Qlfn_Desc_Other Prfl_Crr_Desc   Prfl_Crr_ID Core / Support  Rem_Srvy_ID Rem_Srvy_Desc   Gwhc_Mo_End_Dt  LastName    Level1Stf_No    Level1Name  Level2Stf_No    Level2Name  Level3Stf_No    Level3Name  Level4Stf_No    Level4Name  Level5Stf_No...

1 个答案:

答案 0 :(得分:6)

如果使用引号,则分隔符将被视为正则表达式。当您提供管道|(正则表达式的特殊字符)时,您会在empty stringempty string上进行拆分。所以它分裂在每个角色上......

scala> val m = Array[String]("foo bar", "bar foo")
m: Array[String] = Array(foo bar, bar foo)

scala> m.flatMap(_.split("|"))
res1: Array[String] = Array("", f, o, o, " ", b, a, r, "", b, a, r, " ", f, o, o)

其中任何一个都应该有效:

scala> m.flatMap(_.split("""\|"""))
res2: Array[String] = Array(foo bar, bar foo)

scala> m.flatMap(_.split('|'))
res3: Array[String] = Array(foo bar, bar foo)