我得到了以下代码片段。字节数组是较长数组的摘录。但是,问题变得很明显。一个朋友告诉我它是bytea
,但我很难将其解码为人类可读的字符串。
b = b'\xc1\x01\x00'
import cchardet
enc = cchardet.detect(b)['encoding'] # 'WINDOWS-1252'
b.decode(enc) # 'Á\x01\x00'
b.decode("utf-8", errors='ignore') # '\x01\x00'
有什么主意吗?
备注:数据最初由NetVault创建,并存储在PostgresDB的“ netvault_scheduling”数据库中,并在“ backupseltree”列中具有表“ backupslecrionset”。数据库说编码为UTF-8。但是我的数据(读入pandas数据框)看起来像上面的一样。
声明看起来像这样:
db = postgresql.open('pq://{}:{}@{}:{}/{}'.format({user} {password}, 'localhost', 51486,
'netvault_scheduling'))
prep_stmt = db.prepare("SELECT ph.jobid, jd.title,"
"to_char(to_timestamp(ph.date),'YYYY-MM-DD') AS datum,"
"ph.duration, ph.bytestransferred, jd.clientname, bss.backupseltree"
"FROM phasehistory ph "
"INNER JOIN jobdescription jd "
"ON jd.jobid=ph.jobid "
"INNER JOIN backupselectionset bss "
"on jd.selectionsset = bss.name "
"WHERE jd.jobid = $1 "
"ORDER BY datum DESC ")
backup_nv_tree = prep_stmt(7228)
Df = pd.DataFrame(backup_nv_tree)
Df = ['JobID', 'Title',
'Date',
'Duration', 'Transfered in GB',
'Clientname', 'Backuptree']