Question

我有2个txt文件（a和b _）。

file_a.txt包含一长串4个字母的组合（每行一个组合）：

aaaa
bcsg
aacd
gdee
aadw
hwer
etc.

file_b.txt包含各种长度的字母组合列表（一些带有空格）：

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.

我正在寻找一个允许我执行以下操作的python脚本：

逐行阅读file_a.txt
取每个4个字母的组合（例如aaai）
阅读file_b.txt并查找以4个字母组合开头的所有不同长度的字母组合（例如 aaai bjkes， aaai loiersaaageehikjaaa， aaai loiuwegoiglkjaaaike等。）
将每个搜索的结果打印在一个以4个字母组合命名的单独txt文件中。

档案aaai.txt：

aaaibjkes 
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.

文件bcsi.txt：

bcspwiopiejowih
bcsiweyoieotpwe
etc.

对不起，我是新手。请有人指出我正确的方向。到目前为止，我只有：

#I presume I will have to use regex at some point
import re

file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()

#Should I look into findall()?

Answer 1

我希望这会对你有所帮助;

file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')

#get every item in your second file into a list 
mylist = file2.readlines()

# read each line in the first file
while file1.readline():
    searchStr = file1.readline()
    # find this line in your second file
    exists = [s for s in mylist if searchStr in s]
    if (exists):
        # if this line exists in your second file then create a file for it
        fileNew = open(searchStr,'w')
        for line in exists:
            fileNew.write(line)

        fileNew.close()

    file1.close()

Answer 2

您可以做的是打开这两个文件，然后使用for循环逐行运行这两个文件。

您可以有两个file_a.txt循环，第一个循环读取file_b.txt，因为您只需阅读一次。第二个将读取.find()并在开始时查找字符串。

为此，您必须使用0来搜索字符串。由于它位于开头，因此值应为file_a = open("file_a.txt", "r") file_b = open("file_b.txt", "r") for a_line in file_a: # This result value will be written into your new file result = "" # This is what we will search with search_val = a_line.strip("\n") print "---- Using " + search_val + " from file_a to search. ----" for b_line in file_b: print "Searching file_b using " + b_line.strip("\n") if b_line.strip("\n").find(search_val) == 0: result += (b_line) print "---- Search ended ----" # Set the read pointer to the start of the file again file_b.seek(0, 0) if result: # Write the contents of "results" into a file with the name of "search_val" with open(search_val + ".txt", "a") as f: f.write(result) file_a.close() file_b.close()。

aaaa
bcsg
aacd
gdee
aadw
hwer

测试案例：

我在您的问题中使用测试用例：

<强> file_a.txt

aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake

<强> file_b.txt

bcsg.txt

该程序生成一个输出文件bcsgiweyoieotpwe，因为它应该在{{1}}内。

Answer 3

试试这个：

f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]

data = []
data_dict ={}
for short_word in file1:
    data += ([[short_word,w] for w in file2 if w.startswith(short_word)])

for single_data in data:
    if single_data[0] in data_dict:
        data_dict[single_data[0]].append(single_data[1])
    else:
        data_dict[single_data[0]]=[single_data[1]]

for key,val in data_dict.iteritems():
    open(key+".txt","w").writelines("\n".join(val))
    print(key + ".txt created")

读取一个文件中的行，找到以另一个txt文件中列出的4个字母字符串开头的所有字符串

3 个答案: