def parseline(line):
line = line.values.flatten().tolist() # flatten labeled point pandas dataframe to python list
strLine1 = listToString(line) # custom function just converts list to string for regex operations.
strLine2 = re.sub(r"^1:1 |2:\d+.\d+ ","",strLine1) # filter string to eliminate first two indices; python string
splitLine = strLine2.replace("0 ", "").split(" ") # eliminate specific val; split on spaces; python list of strings
positive = 0 # variable for presence/absence of something instantiated
for feature in splitLine:
featureIndex = feature.split(":")[0]
featureValue = feature.split(":")[1]
if featureIndex in toRemove: # toRemove is a list of vals to eliminate from each line; this works
positive = 1
newLine = ""
if positive == 1:
newLine = [i for i in toRemove not in splitLine] # goal here is to remove values found in the toRemove from the newLine
newLine = "1" + " " + newLine
print(newLine)
else:
newLine = "0" + " " + strLine2
return newLine
这是我正在完成的项目的一些代码。我已经成功地产生了一个列表,其中包含不想在每一行中包含的值。所述列表称为“ toRemove”。
条件语句“ if featureIndex in toRemove”有效,由在“ toRemove”中找到的每个“ featureIndex”旁边打印“此索引需要从最终列表中删除”的打印语句确认。
问题在于,第二个条件语句(如果正== 1,否则为else)从“如果正== 1”条件返回一个列表,该列表只是“ toRemove”的副本。 “ else”条件实际上返回正确的列表。
例如
'if positive == 1:' list output:
['20', '68', '112', '264', '384', '449', '454', '749', '839',...] #this is just a copy of the 'toRemove' list
'else:' list output:
0 3:0.0 4:1 12:1 36710:1 36725:1 36791:1 86715:1 98190:1
我最初尝试将其作为数据类型问题来解决,因此,转换语句旁边的簿记注释。
我在哪里错了?
编辑: 通过“ parseline”功能发送的输入文件具有以下格式:
1:1 2:00 3:00 4:1 9:1 20:1 40:1... # say index 20 is one of the indices in 'toRemove'
1:1 2:10 3:00 45:1 85:1 99:1 100:1... # say none of the index vals in this line are in 'toRemove'
'parseline(line)'删除索引1和2,然后通过'toRemove'列表进行解析以从该列表中删除项目,从而为原始输入文件中的每一行输出'newLine'字符串。
相同的两个示例输入的“ newLine”输出应为
1 3:00 4:1 9:1 40:1... #notice index 20 is gone, and its presence in the list is accounted for by the 1
0 3:00 45:1 85:1 99:1 100:1... #notice since none of the indices in the original list were in the 'toRemove' list,
答案 0 :(得分:0)
是数据类型的问题。问题已解决。谢谢大家。