Python工具/包,用于从文本文件

时间:2017-12-14 09:38:28

标签: java python algorithm

我有一个文本文件,其中包含以下模式中的30个选择题

  1. 问题一到这里?

    一个。备选方案1

    B中。备选方案2

    ℃。备选方案3

    d。备选方案4

  2. 等等到30

    选项数量是可变的;最少有两个,最多六个选项。

    我想在html / php测验这样的界面中练习这些问题,这样我就可以选择选项,最后显示结果。

    我尝试在python中读取文件,然后尝试将问题和答案存储在单独的列表中,但它不起作用。 以下是我的代码:

    #to prevent IndexError 
    question = ['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']
    answers = ['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']
    qOrA = "q"
    mcq_file = "mcqs.txt"
    mcq = open(mcq_file, "r")
    data_list = mcq.readlines()
    
    for i in range(len(data_list)):
        element = list(data_list[i])
        if element[0] == "A" and element[1] == ".":
            qOrA = "a"
    
        if qOrA == "q":
            question[i] = question[i]+ " " + data_list[i]
    
        elif qOrA == "a":
            answers[i] = answers[i]+ " " + data_list[i]
    

    mcq.readlines()输出问题号。以下给出图3 注意:实际上有多个换行符,因此文件结构不正确。

    ['\n', '1.\n', '\n', ' \n', '\n', 'Which computer component contains all the \n', '\n', 'circuitry necessary for all components or \n', '\n', 'devices to communicate with each other?\n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', 'A. Motherboard\n', '\n', ' \n', '\n', ' \n', '\n', 'B. Hard Drive\n', '\n', ' \n', '\n', ' \n', '\n', 'C. Expansion Bus\n', '\n', ' \n', '\n', ' \n', '\n', 'D. Adapter Card\n', '\n', ' \n', '\n', ' \n', '\n', '\n', '\n', '\n', '2. \n', '\n', 'Which case type is typically \n', '\n', 'used for servers?\n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', 'A.\n', '\n', ' \n', '\n', ' \n', '\n', 'Mini Tower\n', '\n', ' \n', '\n', ' \n', '\n', 'B.\n', '\n', ' \n', '\n', ' \n', '\n', 'Mid Tower\n', '\n', ' \n', '\n', ' \n', '\n', 'C.\n', '\n', ' \n', '\n', ' \n', '\n', 'Full Tower\n', '\n', ' \n', '\n', ' \n', '\n', 'D.\n', '\n', ' \n', '\n', ' \n', '\n', 'desktop\n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', '\n', '\n', '\n', '3.\n', '\n', ' \n', '\n', 'What is the most reliable way for users to buy the \n', '\n', 'correct RAM to upgrade a computer?\n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', 'A.\n', '\n', ' \n', '\n', ' \n', '\n', 'Buy RAM that is the same color as the memory sockets \n', '\n', 'on the motherboard.\n', '\n', ' \n', '\n', ' \n', '\n', 'B.\n', '\n', ' \n', '\n', ' \n', '\n', 'Ensure that the RAM chip is the same size as the ROM chip.\n', '\n', ' \n', '\n', ' \n', '\n', 'C.\n', '\n', ' \n', '\n', ' \n', '\n', 'Ensure that the RAM is \n', '\n', 'compatible\n', '\n', ' \n', '\n', 'with the peripherals \n', '\n', 'installed on the motherboard.\n', '\n', ' \n', '\n', ' \n', '\n', 'D.\n', '\n', ' \n', '\n', ' \n', '\n', 'Check the motherboard manual or manufacturer’s website.\n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', ' \n', '\n', '\n', '\n', '\n']
    

2 个答案:

答案 0 :(得分:0)

您可以尝试:

question =[]
mcq = open(mcq_file, "r")
data_list = mcq.readlines()  
for data in data_list:
    data = data.strip() #removes white space
    first_part = data.split(".")[0] 
    if first_part.isnumeric():
        # This is question and you can append this in question list
        question.append(data)

答案 1 :(得分:0)

希望这可以满足您的需求。有一些小错误,因为mcqs.txt中的格式是不可预测的不一致。例如,在问题5中,答案显示无序。当我们选择适应问题26的小写答案选择时,我们将另一个邪恶交换为另一个,这就是为什么问题3答案选择看起来很奇怪("网站。"和#34;主板。&# 34;结束" e。"和" d。")。同样,问题25写成2 \ n 5.这就是为什么在输出中偶尔连接单独的单词的原因。我非常好奇地知道原始文档的格式是什么以及为什么它会被破坏?这些拼写错误还是您从PDF或其他东西粘贴?

#Get text into workable format
txt=open(r"mcqs.txt","r").readlines()
txt=[line.replace("\n"," ") for line in txt]
txt=[line for line in txt if len(line)>0]
txt=[line.encode('ascii','ignore').decode("utf-8") for line in txt]
txt=[line.strip() for line in txt if line!=" " and line!=""]
txt1="".join(txt)     
#Initialize Separator lists
full_test,q_list,let_list=dict(),[str(i)+"." for i in range(1,31)],["A","B","C","D","E","F"]
def segment(txt1,list_of_separators):
    #Returns list of tuples
    #Tuples define start and end index of separator
    i,j,ints,ends=0,0,[],[]
    while j<len(list_of_separators):
        sep=list_of_separators[j]
        if sep in txt1[i:i+len(sep)+1] or sep.lower() in txt1[i:i+len(sep)+1]:
            index=i+len(sep)
            if txt1[i+len(sep)]==".": index=index+1
            ints.append(index)
            ends.append(len(sep))
            j=j+1
        if i==len(txt1):
            break
        i=i+1
    ints=ints+[len(txt1)+ends[-1]]
    tups = [(ints[k],ints[k+1]-ends[k]) for k in range(len(ints)-1)]
    return tups
#Segment based on question number
tups=segment(txt1,q_list)
#Get blocks of text (includes question and answer choices)
blocks,n=[txt1[tup[0]:tup[1]].strip() for tup in tups],1
for block in blocks:
    #Segment based on answer choice
    tups=segment(block,[str(i)+"." for i in let_list])
    tups=[(0,tups[0][0]-2)]+tups
    choices=[block[tup[0]:tup[1]].strip() for tup in tups]
    #Initialize dictionary
    full_test[n]={"Question":choices[0]}
    m=0
    for choice in choices[1:]:
        full_test[n].update({let_list[m]+".":choice})
        m=m+1
    n=n+1
#Prompt user for answer as if actually test
for question in full_test.keys():
    print(str(question)+"."+full_test[question]["Question"]+"\n")
    ind=0
    for choice in full_test[question].items():
        if ind==0:
            ind=ind+1
            continue
        else:
            print(choice[0]+" "+choice[1])
    answer=input("\nAnswer:")
    full_test[question].update({"Answer":answer})

如果我没有感受到挑战,我会修复格式中的微小不一致,并且可能会提出更简单的东西。但那有什么乐趣...