以指定格式填充缺失值-Python

时间:2019-07-24 09:28:52

标签: python python-3.x

我遇到一个问题,明确要求我不要使用numpy或pandas。

问题:

给出一个包含数字和'_'(缺失值)符号的字符串,您必须按照说明替换'_'符号

Ex 1: _, _, _, 24 ==> 24/4, 24/4, 24/4, 24/4 i.e we. have distributed the 24 equally to all 4 places 

Ex 2: 40, _, _, _, 60 ==> (60+40)/5,(60+40)/5,(60+40)/5,(60+40)/5,(60+40)/5 ==> 20, 20, 20, 20, 20 i.e. the sum of (60+40) is distributed qually to all 5 places

Ex 3: 80, _, _, _, _  ==> 80/5,80/5,80/5,80/5,80/5 ==> 16, 16, 16, 16, 16 i.e. the 80 is distributed qually to all 5 missing values that are right to it

Ex 4: _, _, 30, _, _, _, 50, _, _  
==> we will fill the missing values from left to right 
    a. first we will distribute the 30 to left two missing values (10, 10, 10, _, _, _, 50, _, _)
    b. now distribute the sum (10+50) missing values in between (10, 10, 12, 12, 12, 12, 12, _, _) 
    c. now we will distribute 12 to right side missing values (10, 10, 12, 12, 12, 12, 4, 4, 4)
对于具有逗号分隔值的给定字符串,

将同时具有两个缺失值,例如:“ _,_,x,_,_,”,您需要填写缺失值Q:您的程序读取字符串例如ex:“ ,_,x,_,_,_”,并返回填充的序列Ex:

Input1: "_,_,_,24"
Output1: 6,6,6,6

Input2: "40,_,_,_,60"
Output2: 20,20,20,20,20

Input3: "80,_,_,_,_"
Output3: 16,16,16,16,16

Input4: "_,_,30,_,_,_,50,_,_"
Output4: 10,10,12,12,12,12,4,4,4

我正在尝试使用split函数将字符串拆分为列表。然后,我尝试检查左侧的空白,并计算此类空白的数量,然后一旦遇到非空白,就用该数字除以总计数,即(在数字之前遇到的空白和数字本身),然后散布值并替换空格,剩下数字

然后我要检查两个数字之间的空格,然后应用相同的逻辑,然后对右侧的空格进行相同的操作。

但是,我在下面共享的代码引发了各种各样的错误,并且我相信我在上面共享的逻辑上存在空白,因此希望能对解决此问题有深刻的见识

def blanks(S):

  a= S.split()
  count = 0
  middle_store = 0
  #left
  for i in range(len(a)):
    if(a[i]=='_'):
      count = count+1  #find number of blanks to the left of a number
    else:
      for j in range(0,i+1):
        #if there are n blanks to the left of the number speard the number equal over n+1 spaces
        a[j] = str((int(a[i])/(count+1)))
        middle_store= i
    break  

  #blanks in the middle
  denominator =0
  flag = 0
  for k in len(middle_store+1,len(a)):
    if(a[k] !='_'):
      denominator = (k+1-middle_store)
      flag=k
    break

  for p in len(middle_store,flag+1):
    a[p] = str((int(a[p])/denominator))

  #blanks at the right 
  for q in len(flag,len(a)):
    a[q] = str((int(a[q])/(len(a)-flag+1)))

S=  "_,_,30,_,_,_,50,_,_"
print(blanks(S))

6 个答案:

答案 0 :(得分:2)

首先,您应该在split方法中指定一个定界符作为参数,默认情况下,它将按空格分隔。

所以 "_,_,x,_,_,y,_".split()给您['_,_,x,_,_,y,_']

"_,_,x,_,_,y,_".split(',')会给您['_', '_', 'x', '_', '_', 'y', '_']

第二,对于“中间”和“右”循环(对于右),您需要将len替换为range

由于要进行除法,因此最好使用float而不是int

由于将其用于除法,因此最好将分母初始化为1。

在最后一个循环中,a[q] = str((int(a[q])/(len(a)-flag+1)))(与a[p]相同)应返回错误,因为a [q]为“ _”。您需要使用变量来保存a[flag]值。

每个中断都应在else或if语句中,否则,您将只通过一次循环。

最后,为了提高复杂度,您可以从j循环中退出middle_store分配,以避免每次都对其进行分配。

TL; DR:试试这个:

def blanks(S):
    a = S.split(',')
    count = 0
    middle_store = 0
    # left
    for i in range(len(a)):
        if a[i] == '_':
            count = count + 1  # find number of blanks to the left of a number
        else:
            for j in range(i + 1):
                # if there are n blanks to the left of the number speard the number equal over n+1 spaces
                a[j] = str((float(a[i]) / (count + 1)))
            middle_store = i
            middle_store_value = float(a[i])
            break

        # blanks in the middle
    denominator = 1
    flag = 0
    for k in range(middle_store + 1, len(a)):
        if a[k] != '_':
            denominator = (k + 1 - middle_store)
            flag = k
            break
    flag_value = float(a[flag])
    for p in range(middle_store, flag + 1):
        a[p] = str((middle_store_value+flag_value) / denominator)

    # blanks at the right
    last_value = float(a[flag])
    for q in range(flag, len(a)):
        a[q] = str(last_value / (len(a) - flag))

    return a

S=  "_,_,30,_,_,_,50,_,_"
print(blanks(S))

PS:您甚至尝试过解决错误吗?还是只是等待某人解决您的数学问题?

答案 1 :(得分:1)

模块化解决方案

# takes an array x and two indices a,b. 
# Replaces all the _'s with (x[a]+x[b])/(b-a+1)
def fun(x, a, b):
    if a == -1:
        v = float(x[b])/(b+1)
        for i in range(a+1,b+1):
            x[i] = v
    elif b == -1:
        v = float(x[a])/(len(x)-a)
        for i in range(a, len(x)):
            x[i] = v
    else:
        v = (float(x[a])+float(x[b]))/(b-a+1)
        for i in range(a,b+1):
            x[i] = v
    return x

def replace(text):
    # Create array from the string
    x = text.replace(" ","").split(",")
    # Get all the pairs of indices having number
    y = [i for i, v in enumerate(x) if v != '_']
    # Starting with _ ?
    if y[0] != 0:
        y = [-1] + y
    # Ending with _ ?
    if y[-1] != len(x)-1:
        y = y + [-1]    
    # run over all the pairs
    for (a, b) in zip(y[:-1], y[1:]): 
        fun(x,a,b)          
    return x

# Test cases
tests = [
    "_,_,_,24",
    "40,_,_,_,60",
    "80,_,_,_,_",
     "_,_,30,_,_,_,50,_,_"]

for i in tests:
    print (replace(i))

答案 2 :(得分:0)

针对所引发问题的代码也可以通过以下方式来完成,尽管该代码并未经过优化和简化,但它是从不同的角度编写的:

import re 
def curve_smoothing(S): 

    pattern = '\d+'
    ls_num=re.findall(pattern, S)   # list of numeral present in string
    pattern = '\d+'
    spaces = re.split(pattern, S)  # split string to seperate '_' spaces

    if len(spaces[0])==0 and len(ls_num)==1:
        Space_num=len(re.findall('_',  S))
        sums=int(ls_num[0])
        repl_int=round(sums/(Space_num+1))
        S=re.sub(r'(\d{2})(,_)', r'{}\2'.format(str(repl_int)) , S, 1)
        S=re.sub('_', str(repl_int),S, Space_num)
        return S

    elif len(spaces[0])==0 and len(ls_num)>1:
        for i in range(1,len(spaces)):
            if i==1:
                Space_num=len(re.findall('_',  spaces[i]))
                sums=int(ls_num[i-1])+int(ls_num[(i)])
                repl_int=round(sums/(Space_num+2))
                S=re.sub(str(ls_num[i-1]), str(repl_int),S)
                S=re.sub('_', str(repl_int),S, Space_num)
                S=re.sub(str(ls_num[i]), str(repl_int),S,1)
                ls_num[i]=repl_int
            elif i<len(ls_num):
                Space_num=len(re.findall('_',  spaces[i]))
                sums=int(ls_num[i-1])+int(ls_num[(i)])
                repl_int=round(sums/(Space_num+2))
                S=re.sub(r'(\d{2})(,_)', r'{}\2'.format(str(repl_int)) , S, 1)
                S=re.sub('_', str(repl_int),S, Space_num)
                S=re.sub(str(ls_num[i]), str(repl_int),S,1)
                ls_num[i]=repl_int
            elif len(spaces[-1])!=0:
                Space_num=len(re.findall('_',  spaces[i]))
                repl_int=round(ls_num[(i-1)]/(Space_num+1))
                S=re.sub(r'(\d{2})(,_)', r'{}\2'.format(str(repl_int)) , S, 1)
                S=re.sub('_', str(repl_int),S, Space_num)
        return S


    else:
        for i in range(len(spaces)):
            if i==0:
                Space_num=len(re.findall('_',  spaces[i]))
                sums=int(ls_num[(i)])
                repl_int=round(sums/(Space_num+1))
                S=re.sub(r'(\d{2})(,_)', r'{}\2'.format(str(repl_int)) , S, 1)
                S=re.sub('_', str(repl_int),S, Space_num)
                S=re.sub(str(ls_num[i]), str(repl_int),S, 1)
                ls_num[i]=repl_int
            elif i>=1 and i<len(ls_num):
                Space_num=len(re.findall('_',  spaces[i]))
                sums=int(ls_num[i-1])+int(ls_num[(i)])
                repl_int=round(sums/(Space_num+2))
                S=re.sub(r'(\d{2})(,_)', r'{}\2'.format(str(repl_int)) , S, 1)
                S=re.sub('_', str(repl_int),S, Space_num)
                S=re.sub(str(ls_num[i]), str(repl_int),S,1)
                ls_num[i]=repl_int
            elif len(spaces[-1])!=0:
                Space_num=len(re.findall('_',  spaces[i]))
                repl_int=round(ls_num[(i-1)]/(Space_num+1))
                S=re.sub(r'(\d{2})(,_)', r'{}\2'.format(str(repl_int)) , S, 1)
                S=re.sub('_', str(repl_int),S, Space_num)
        return S

S1="_,_,_,24"
S2="40,_,_,_,60"
S3=  "80,_,_,_,_"
S4="_,_,30,_,_,_,50,_,_"
S5="10_,_,30,_,_,_,50,_,_"
S6="_,_,30,_,_,_,50,_,_20"
S7="10_,_,30,_,_,_,50,_,_20"

print(curve_smoothing(S1))
print(curve_smoothing(S2))
print(curve_smoothing(S3))
print(curve_smoothing(S4))
print(curve_smoothing(S5))
print(curve_smoothing(S6))
print(curve_smoothing(S7))

答案 3 :(得分:0)

# _, _, 30, _, _, _, 50, _, _ 
def replace(string):
    lst=string.split(',')
    for i in range(len(lst)):
        if lst[i].isdigit():
            for j in range(i+1):
                lst[j]=int(lst[i])//(i+1)
            new_index=i
            new_value=int(lst[i])
            break
    for i in range(new_index+1,len(lst)):
        if lst[i].isdigit():
            temp=(new_value+int(lst[i]))//(i-new_index+1)
            for j in range(new_index,i+1):
                lst[j]=temp
            new_index=i
            new_value=int(lst[i])
    try:
        for i in range(new_index+1,len(lst)):
            if not(lst[i].isdigit()):
                count=lst.count('_')
                break
        temp1=new_value//(count+1)
        for i in range(new_index,len(lst)):
            lst[i]=temp1
    except:
        pass
    return lst        

答案 4 :(得分:0)

def replace(string):
lst=string.split(',')
if lst[0].strip().isdigit():
    index0=0
    while True:
        index1=index0
        value1=int(lst[index0].strip())
        index2=index1
        for i in range((index1+1),len(lst)):
            if lst[i].strip().isdigit():
                index2=i
                break
        value2=0
        if index2>index1:
            value2=int(lst[index2].strip())
        else:
            index2=len(lst)-1
        value=str(int((value1+value2)/((index2-index1)+1)))
        for i in range(index1,index2+1):
            lst[i]=value
        index0=index2

        if index0>=(len(lst)-1):
            break

else:
    index0=0
    while True:
        index1=index0
        value1=0
        if lst[index0].strip().isdigit():
            value1=int(lst[index0].strip())
        index2=index1
        for i in range((index1+1),len(lst)):
            if lst[i].strip().isdigit():
                index2=i
                break
        value2=0
        if index2>index1:
            value2=int(lst[index2].strip())
        else:
            index2=len(lst)-1
        value=str(int((value1+value2)/((index2-index1)+1)))
        for i in range(index1,index2+1):
            lst[i]=value
        index0=index2

        if index0>=(len(lst)-1):
            break

return lst   

string = "20,_,_,30, _, _,10,_,_,_,_,110"
replace(string)

答案 5 :(得分:-1)

“检查所有输入的内容是否有效”

def替换:

val=0
lst=s.split(",")

if lst[0].isdigit():
    for i in range(1,len(lst)):
        if lst[i].isdigit():
            value=(int(lst[0])+int(lst[i]))//((i+1))
            for j in range(0,i+1):
                lst[j]=value
            index=i
            break    
else:
    for i in range(len(s)):
        if lst[i].isdigit():
            for j in range(i+1):
                lst[j]=(int(lst[i]))//(i+1)
            index=i
            value=int(lst[i])
            break
for i in range(index+1,len(lst)):
    if lst[i].isdigit():
        temp=(value+int(lst[i]))//(i-index+1)
        for j in range(index,i+1):
            lst[j]=temp
        index=i
        value=int(lst[i])


try :
    for i in range(index+1,len(lst)):
        if not(lst[i].isdigit()):
            count=lst.count('_')
            break
    temp1=value//(count+1)
    for i in range(index,len(lst)):
        lst[i]=temp1
except UnboundLocalError as e:
    print (e)
return lst