Question

我有一定的检查要做，如果检查满足，我想要打印结果。以下是代码：

import string
import codecs
import sys
y=sys.argv[1]

list_1=[]
f=1.0
x=0.05
write_in = open ("new_file.txt", "w")
write_in_1 = open ("new_file_1.txt", "w")
ligand_file=open( y, "r" ) #Open the receptor.txt file
ligand_lines=ligand_file.readlines() # Read all the lines into the array
ligand_lines=map( string.strip, ligand_lines ) #Remove the newline character from all     the pdb file names
ligand_file.close()

ligand_file=open( "unique_count_c_from_ac.txt", "r" ) #Open the receptor.txt file
ligand_lines_1=ligand_file.readlines() # Read all the lines into the array
ligand_lines_1=map( string.strip, ligand_lines_1 ) #Remove the newline character from all the pdb file names
ligand_file.close()
s=[]
for i in ligand_lines:
   for j in ligand_lines_1:
      j = j.split()
      if i == j[1]:
     print j

上面的代码效果很好，但是当我打印j时，它的打印方式类似于['351'，'342']，但我希望得到351 342（中间有一个空格）。由于它更像是一个python问题，我没有包含输入文件（基本上它们只是数字）。

任何人都可以帮助我吗？

干杯，

Chavanak

Answer 1

要将字符串列表转换为列表项目之间有空格的单个字符串，请使用' '.join(seq)。

>>> ' '.join(['1','2','3'])
'1 2 3'

您可以使用项目之间的任何字符串替换' '。

Answer 2

Mark Rushakoff似乎已经解决了您的问题，但是您的代码还有其他一些改进。

始终使用上下文管理器（with open(filename, mode) as f:）来打开文件，而不是依赖于close手动调用。
不要经常将整个文件读入内存。循环some_file.readilines()可以替换为some_file直接循环。
- 例如，您可以使用map(string.strip, ligland_file)或更好[line.strip() for line in ligland_file]
不要选择名称来包含它们引用的对象的类型。这些信息可以通过其他方式找到。

例如，您发布的代码可以简化为

import sys
from contextlib import nested

some_real_name = sys.argv[1]
other_file = "unique_count_c_from_ac.txt"

with nested(open(some_real_name, "r"), open(other_file, "r")) as ligand_1, ligand_2:
    for line_1 in ligand_1:
        # Take care of the trailing newline
        line_1 = line_1.strip()

        for line_2 in ligand_2:
            line_2 = line2.strip()

            numbers = line2.split()

            if line_1 == numbers[1]:
                # If the second number from this line matches the number that is 
                # in the user's file, print all the numbers from this line
                print ' '.join(numbers)

更可靠，我相信更容易阅读。

请注意，由于这些嵌套循环，其算法性能远非理想。根据您的需要，这可能会有所改进，但由于我不确切知道您需要提取哪些数据来告诉您是否可以。

我的代码和你的代码当前所用的时间是O（n m q），其中n是一个文件中的行数，m是另一个文件中的行数，以及q是unique_count_c_from_ac.txt中的行长度。如果其中两个是固定的/小的，那么你就具有线性性能。如果两个可以任意增长（我想象n和m可以吗？），那么你可以考虑改进你的算法，可能使用集合或dicts。

打印元素不在列表中

2 个答案: