在N个字符后以最接近的逗号分隔一个长字符串并循环每个字符串

时间:2016-02-21 06:51:52

标签: regex linux string shell unix

输入

    stringType1= ('34343,43434, 34343, 434343,
234243,343433,53434,4343,4343
434344,434343,434343,43434,4343
etc...till approx 50K character length')

stringType2= ('34343','43434', '34343', '434343',
'234243','343433','53434','4343','4343'
)
etc...till approx 50K character length'
# Note the new line \n after group of 4-5 strings         
  and some have space before  them and some don't

将字符串拆分成更小的字符串。每个字符串不应超过10K个字符 以最接近的逗号分隔(在任何逗号之前或之后)并且应该按顺序编号

输出

From 
stringType1 
StTy1_1=('34343,43434,34343,434343'...) #stop at 10k characters
StTy1_2=('234243,343433,53434,4343,4343'...) #stop at 10k characters


 # keep making bundles of 10K character strings 
 # Splitting stringType1  at the closest comma after 10k characters.
 # Remove all space character and single quotes 
 # except the ones at the ends of  each string 
    StTy2_1=('34343,43434,34343,434343'...) #stop at 10k characters
    StTy2_2=('234243,343433,53434,4343,4343'...) #stop at 10k characters
    Then each 10K string takes an eacho statement around it 
   echo "sel function ('34343,43434,34343,434343'...) as D1" >> file 
   echo "sel function ('234243,343433,53434,4343,4343'...) as D2" >> file 

我做了什么
删除了所有换行符,空格和单引号

 stringType1_op= ('34343,43434, 34343,434343,234243,343433,53434,4343,4343,434344,434343,434343,43434,4343
etc...till approx 50K character length')

我需要什么

  • 在10k之后,以最接近的逗号

  • 拆分stringType1
  • 计数器:在每个字符串创建10K字符串后,有一个计数器机制将执行此操作

     echo "sel function ('34343,43434,34343,434343'...) as D1" >> file 

1 个答案:

答案 0 :(得分:1)

最好的解决方案可能是awk脚本。使用awk可以避免while循环。 当您需要调用外部实用程序时,while循环变慢,我在这里避免使用。 我首先调用一些实用程序来获取换行符上的每个数字:

sed -n '/stringType/ p' input | tr " ,'" "\n" | tr -s "\n"

我将用循环处理它:

startline=1
endline=0
dimnumber=0
sed -n '/stringType/ p' input | tr " ,'" "\n" | tr -s "\n"  | while read -r line; do
   if [[ "${startline}" = "1" ]]; then
      totallen=0
      printf "%s" "echo sel function ('"
      startline=0
      (( dimnumber++))
      continue
   fi
   if [[ "${line}" =~ ^[0-9] ]]; then
       if [[ "$totallen" = "0" ]]; then
          printf "%s" "${line}"
       else
          printf "%s" ",${line}"
       fi
       (( totallen += ${#line} ))
   fi
   if [[ ${totallen} -gt 10 ]]; then
      endline=1
   fi
   if [[ "${line}" = *\)* ]]; then
      endline=1
   fi
   if [[ "${endline}" = "1" ]]; then
      printf "%s%s\n" "') as Dim" "${dimnumber}"
      endline=0
      startline=1
   fi
done