Question

我正在尝试使用另一个文件的内容对一个文件的内容进行子集化。 File1每行包含一个值：

43
44
101

File2每行包含两个值，值由一个空格分隔：

我想根据File1的内容过滤File2的内容，这样如果File2的任何一行中的第一个值与File1中的任何值匹配，File2中的行将被打印到一个新文件。 File2中的某些行不应包含在新文件中（File1中不存在值），并且File1中的某些值在File2中有多个条目。输出应如下所示：

我一直在尝试使用Python完成此任务的代码。我对这种语言比较陌生，但这是迄今为止我尝试过的一些内容：

output=open("new_file.txt","a") 

for i in file2:
    key="%s" % (i.split()[0])
    if key in file1:
        output.write(i)

有关如何使此代码正常运行的任何建议？谢谢！

Answer 1

首先，实现一些逻辑，使第一个文件中的所有数字成为一个集合（这段代码将它们保持为字符串，而不是数字，但这在很大程度上是不相关的）：

nums = set()
with open("file1.txt") as file1:
    for line in file1:
        nums.add(line.strip())

接下来，我们有用于过滤第二个文件中每一行的代码。我们可以同时输出到最终文件，或者只是暂时存储所有内容并在以后执行。此代码同时执行：

with open("file2.txt") as file2, open("output.txt", "wt") as output:
    for line in file2:
        to_check = line.strip().split()[0]
        if to_check in nums:
            print(line.strip(), file=output)

这应该可以解决问题。我已经用你提供的东西对它进行了测试，它似乎给出了你想要的结果，但是如果它没有达到你的预期，请告诉我。

Answer 2

我会这样做

from cms.toolbar_pool import toolbar_pool
from cms.extensions.toolbar import ExtensionToolbar
from django.utils.translation import ugettext_lazy as _
from .models import IconExtension


@toolbar_pool.register
class IconExtensionToolbar(ExtensionToolbar):
    # defines the model for the current toolbar
    model = IconExtension

    def populate(self):
        # setup the extension toolbar with permissions and sanity checks
        current_page_menu = self._setup_extension_toolbar()
        # if it's all ok
        if current_page_menu:
            # retrieves the instance of the current extension (if any) and the toolbar item URL
            page_extension, url = self.get_page_extension_admin()
            if url:
                # adds a toolbar item
                current_page_menu.add_modal_item(_('CMS Extensions'), url=url,
                    disabled=not self.toolbar.edit_mode)

将第一个文件读入列表，将第二个文件读入嵌套列表。然后遍历file2值，检查第一个条目是否在文件1的列表中。如果您处理了许多值，那么您可以{{child.pageobj.iconextension.description_short}} with open('file1.txt') as f1: set1 = [line.strip() for line in f1] with open('file2.txt') as f2: vals = [[val for val in line.split()] for line in f2] with open('out.txt', 'w') as fout: for val in vals: if val[0] in set1: fout.write(' '.join(val) + '\n')来改善从线性到恒定时间的查找。对于少量的值，它可能不值得开销。

一个文件的子集内容基于另一个文件的内容

2 个答案: