Question

我有两个文件，内容如下：

alt text http://img144.imageshack.us/img144/4423/screencapture2b.png

alt text http://img229.imageshack.us/img229/9153/screencapture1c.png

请仅考虑粗体列和红色列。剩下的文字是垃圾和不必要的。从两个文件可以看出，它们在很多方面都很相似。我正在尝试比较file_1和file_2中的粗体文本（它没有粗体，但希望你可以看出它是同一列），如果它们不同，我想打印出file_1中的红色文本。我通过以下脚本实现了这个目标：

import string
import itertools

chain_id=[]
for file in os.listdir("."):
    basename = os.path.basename(file)
    if basename.startswith("d.complex"):
        chain_id.append(basename)

for i in chain_id:
    print i
    g=codecs.open(i,  encoding='utf-8')

    f=codecs.open("ac_chain_dssp.dssp",  encoding='utf-8')
    for (x, y) in itertools.izip(g,  f): 
            if y[11]=="C":
                if y[35:38]!= "EN":
                    if y[35:38] != "OTE":
                        if x[11]=="C":
                            if x[12] != "C":
                                if y[35:38] !=x[35:38]:
                                    print x [7:10]


    g.close()
    f.close()

但我得到的结果并不是我的预期。现在我想以这样的方式修改上面的代码：当我比较粗体列时，如果值之间的差异大于2，则必须打印出结果。例如，file_1中粗体列的第1行为83，而在file_2中为84，因为两者之间的差异小于2，我希望它被拒绝。

有人可以帮我添加剩下的代码吗？干杯， Chavanak

PS：这不是作业：）

Answer 1

你的问题的直接答案是改变最后的条件，
if y[35:38] !=x[35:38]: 因此，[35:38]的“字段”转换为int（或float ...），并且可以对它们应用差异。给予像

这样的东西

   try:
     iy = int(y[35:38])
     ix = int(x[35:38])
   except ValueError:
     # here for whatever action is appropriate, including silent ignoring.
     print("Unexpected value for record # %s" % x[7:10])

   if abs(ix - iy) > 2:
     print(x[7:10])

更间接地，问题中的片段提示以下评论，这可能反过来提出了解决问题的不同方法。

首先，如果文件是严格的“固定格式”，如果它们非常大，和/或如果没有其他任何内容与文件中找到的任何其他“字段”值完成，则当前方法是有效的，可能非常有效。
一些测试看起来很傻/总是正确等（比如将3个字符的片段与2个字符的字符串文字进行比较。除了逻辑错误之外，这也指向更“解析”的解决方案，其中更容易避免这样的逻辑错误或更明显。

Answer 2

与您的问题无关，但是：

        if y[11]=="C":
            if y[35:38]!= "EN":
# I don't see any "EN" or "OTE" anywhere in your sample input.
# In any case the above condition will always be true, because
# y[35:38] appears to be a 3-byte string but "EN" is a 2-byte string.
                if y[35:38] != "OTE":
                    if x[11]=="C":
                        if x[12] != "C":
                            if y[35:38] !=x[35:38]:
                                print x [7:10]

是嗯...

您可能希望考虑另一种表达方式，例如

if (x[11] == "C" == y[11]
and x[12] != "C"
and y[35:38] not in ("EN?", "OTE")
and y[35:38] != x[35:38]):
    print x[7:10]

Answer 3

我没有完全理解你的问题，但是

档案1

100 C 20.2
300 B 33.3

文件2

110 C 20.23
320 B 33.34

并且您想要比较两个文件的第3列。

lines1 = file1.readlines()
list1 = [float(line.split()[2]) for line in lines1] # list of 3rd column values

lines2 = file2.readlines()
list2 = [float(line.split()[2]) for line in lines2]

result = map(lambda x,y: x-y < 2,list1,list2)

OR

 result = [list1[i]-list2[i] for i in range(len(list1)) if list1[i] - list2[i] > 2]

这是你想要的吗？

计算列表中的差异

3 个答案: