在Python

时间:2015-10-26 16:20:41

标签: python python-3.x encoding binary base64

我希望将一个文件转换为二进制文件,最好使用Python,因为我最熟悉它,但如果走过,我可能会使用另一种语言。

基本上,我需要这个项目,我正在处理我们想要使用DNA链存储数据的项目,因此需要以二进制文件存储文件(' A ' T ' = 0,' G & #39; s' C ' s = 1

知道我怎么办?我确实发现使用可以在 base64 中进行编码,然后对其进行解码,但它看起来效率有点低,而且我所拥有的代码似乎无法正常工作... < / p>

import base64
import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
with open(file_path) as f:
    encoded = base64.b64encode(f.readlines())
    print(encoded)

另外,我已经有一个程序可以简单地用文本来完成。任何有关如何改进它的提示也将不胜感激!

import binascii
t = bytearray(str(input("Texte?")), 'utf8')
h = binascii.hexlify(t)
b = bin(int(h, 16)).replace('b','') 
#removing the b that appears in the end for some reason
g = b.replace('1','G').replace('0','A')
print(g)

例如,如果我输入测试: 好的,因为文本到DNA: 我输入&#39; test&#39;并期望来自二进制的DNA序列 二进制是:01110100011001010111001101110100(我也要求打印示例中的每个转换,以便更容易理解)

>>>Texte?test #Asks the text
>>>b'74657374' #converts to hex
>>>01110100011001010111001101110100 #converts to binary
>>>AGGGAGAAAGGAAGAGAGGGAAGGAGGGAGAA #converts 0 to A and 1 to G

2 个答案:

答案 0 :(得分:0)

当然,这是低效的! base64 旨在将二进制文件存储在文本中。转换后它会产生更大的尺寸块。

btw:你想要什么样的效率?紧凑?

如果是这样:第二个样本离您想要的更近

btw:在你的任务中你丢失了信息!你知道吗?

以下是如何存储和恢复的示例。

它以易于理解的Hex-In-Text格式存储数据 - 仅仅是为了演示。如果您想要紧凑 - 您可以轻松修改代码以便存储在二进制文件中,或者如果您想要00011001视图 - 修改也很容易。

import math
#"make a long test string"
import numpy as np
s=''.join((str(x) for x in np.random.randint(4,size=33)))\
    .replace('0','A').replace('1','T').replace('2','G').replace('3','C')

def store_(s):
    size=len(s) #size will changed to fit 8*integer so remember true value of it and store with data
    s2=s.replace('A','0').replace('T','0').replace('G','1').replace('C','1')\
        .ljust( int(math.ceil(size/8.)*8),'0') #add '0' to 8xInt to the right
    a=(hex( eval('0b'+s2[i*8:i*8+8]) )[2:].rjust(2,'0') for i in xrange(len(s2)/8))
    return ''.join(a),size

yourDataAsHexInText,sizeToStore=store_(s)
print yourDataAsHexInText,sizeToStore


def restore_(s,size=None):
    if size==None: size=len(s)/2
    a=( bin(eval('0x'+s[i*2:i*2+2]))[2:].rjust(8,'0') for i in xrange(len(s)/2))
    #you loose information, remember?, so it`s only A or G
    return (''.join(a).replace('1','G').replace('0','A') )[:size]

restore_(yourDataAsHexInText,sizeToStore)


print "so check it"
print s ,"(input)"
print store_(s)
print s.replace('C','G').replace('T','A') ,"to compare with information loss"
print restore_(*store_(s)),"restored"
print s.replace('C','G').replace('T','A') == restore_(*store_(s))

导致我的测试:

63c9308a00 33
so check it
AGCAATGCCGATGTTCATCGTATACTTTGACTA (input)
('63c9308a00', 33)
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA to compare with information loss
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA restored
True

答案 1 :(得分:0)

所以,感谢@jonrshape和Sergey Vturin,我终于能够实现我想要的! 我的程序要求一个文件,把它变成二进制文件,然后给我一个等价的&#34; DNA代码&#34;使用二进制数对(00 = A,01 = T,10 = G,11 = C)

import binascii
from tkinter import filedialog

file_path = filedialog.askopenfilename()

x = ""
with open(file_path, 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        x += str(binascii.hexlify(chunk)).replace("b","").replace("'","")
b = bin(int(x, 16)).replace('b','')
g = [b[i:i+2] for i in range(0, len(b), 2)]
dna = ""
for i in g:
    if i == "00":
        dna += "A"
    elif i == "01":
        dna += "T"
    elif i == "10":
        dna += "G"
    elif i == "11":
        dna += "C"
print(x) #hexdump
print(b) #converted to binary
print(dna) #converted to "DNA"