在Python 3中分割/切片三个不同的字符串数组

时间:2019-01-28 09:24:29

标签: python-3.x split slice

代码

def clouds_function():
    """
    Extracts Cloud Height and Type from the data 
    Returns: Cloud Height and Type CCCXXX
    """ 
    clouds1 = content[1]
    clouds1 = clouds1[15:len(clouds1)]
    clouds1 = clouds1.split()

    clouds2 = content[2]
    clouds2 = clouds2 + "  "
    clouds2=[clouds2[y-8:y] for y in range(8, len(clouds2)+8,8)]

    clouds3 = content[3]
    clouds3 = clouds3 + "  "
    print(clouds3)
    clouds3=[clouds3[y-8:y] for y in range(8, len(clouds3)+8,8)]

    return(clouds3)

print(clouds_function())

样本数据

content[1] = 'OVC018  BKN006  OVC006  OVC006  OVC017  OVC005  OVC005 OVC016  OVC029  OVC003  OVC002  OVC001  OVC100'
content[2] ='         OVC025                          OVC010  OVC009                                         OVC200'
content[3] ='         OVC100                                       '

我尝试了

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

它返回['OVC100 ']的{​​{1}}

我需要

content[3]

结果

['','OVC100','','','','','','','','','','','']

我需要齐次数组

开始时每个长度都是不均匀的,这可能是个问题。

1 个答案:

答案 0 :(得分:1)

您的数据存在长度问题,并且间隙大小不同(2个或1个字符):

c[1] = 'OVC018  BKN006  OVC006  OVC006  OVC017  OVC005  OVC005 OVC016  OVC029  OVC003  OVC002  OVC001  OVC100'
c[2] ='         OVC025                          OVC010  OVC009                                         OVC200'
c[3] ='         OVC100                                       '
  • c[2]c[3]在第二个值的开头使用9个字符,c[1]仅使用8个字符
  • 'OVC005 OVC016'之间只有1个空格,通常为2
  • c [3]比其他短很多

如果长度是固定的或可预测的(不是),切片是很好的方法-使用简单的字符串加法和用空格分隔符替换空格可以更好地解决此问题:

  1. 将所有字符串都等长-用空格填充
  2. '-'替换所有[8,7,6,2,1]长的空格-一个(新的)人工分隔符
  3. '-'处分裂

content= ['OVC018  BKN006  OVC006  OVC006  OVC017  OVC005  OVC005 OVC016  OVC029  OVC003  OVC002  OVC001  OVC100',
          '        OVC025                          OVC010  OVC009                                         OVC200',
          '        OVC100                                       ']

# extend data 
max_len = max(len(data) for data in content)

for i,c in enumerate(content):
    # fix legths 
    content[i] = c + " " * (max_len-len(c))
    # replace stretches of spaces by a splitter character
    content[i] = content[i].replace(" "*8,"-").replace(" "*7,"-").replace(" "*6,"-").replace("  ","-").replace(" ","-")


hom = [c.split("-") for c in content]
for c in hom:
    print(c,"\n") 

输出:

['OVC018', 'BKN006', 'OVC006', 'OVC006', 'OVC017', 'OVC005', 'OVC005', 'OVC016', 'OVC029', 'OVC003', 'OVC002', 'OVC001', 'OVC100']

['', 'OVC025', '', '', '', 'OVC010', 'OVC009', '', '', '', '', '', 'OVC200']

['', 'OVC100', '', '', '', '', '', '', '', '', '', '', '']