Pandas Dataframe中的单独列在不同的行中具有不同的制表符分隔

时间:2015-04-13 11:37:03

标签: python python-2.7 python-3.x pandas

我有一个类似这样的测试文件: -

                2464                                            2480                                            2481

Test results for policy NSS-Tuned               Test results for policy NSS-Tuned               Test results for policy NSS-Tuned

BPS Profile             Throughput              BPS Profile             Throughput              BPS Profile             Throughput

SigTestHTTP21kBin               216.966666667           SigTestHTTP21kBin               219.1           BPSHTTP21KBINARY        219.16
SigTestHTTP21kHtml              359.433333333           SigTestHTTP21kHtml              355.6           BPS-HTTP21K-HTML        364.0
SigTestHTTP21kText              379.95          SigTestHTTP21kText              377.9           BPS-HTTP21K-TEXT        376.25
NSS-HTTP21Kdelay                378.15          NSS-HTTP21Kdelay                381.15          BPS-HTTP21K-DELAY       380.2
NSS-HTTPCPS             18920           NSS-HTTPCPS             6599            BPS-HTTPCPS     74.6522222222
SIggTestPerimeter               270.233333333           SIggTestPerimeter               243.433333333           BPS-PERIMETER   222.8
SIgTestDatacenter               370.825         SIgTestDatacenter               380.24          BPS-DATACENTER  373.275
NSS-Financial           5               NSS-Financial           BPS-FINANCIAL   56.345
NSS-Education           971.125         NSS-Education           950.4           BPS-EDUCATION   1010.2
NSS-EuroMobile          920.68          NSS-EuroMobile          1001.075                BPS-EUROMOBILE  932.525
NSS-USMobile            528.2           NSS-USMobile            570.6           BPS-USMOBILE    541.9

您会看到第一个标题由4个标签(\ t \ t \ t \ t \ t)

分隔

第二个标题由2个标签(\ t \ t)

分隔

后续结果由2个标签分隔(\ t \ t)。

现在我需要操纵吞吐量列并生成新列以计算百分比等。

我写的代码是:

#!/usr/bin/python

import time
import os,sys
from os import path
import re
import sys, ast
import subprocess
import numpy as np
#from StringIO import StringIO
import pandas as pd


location = "/root/madhu_test/bpstest/results/finalnss.txt"
#print location
f = pd.read_csv(location,delimiter='\t\t',header=True)
print f
cols = f.columns.tolist()
print cols
f = f.drop('BPS Profile.2', 1)
f = f.drop('BPS Profile.1', 1)
np.radians(f['Throughput'])
np.radians(f['Throughput.1'])

f['percentage'] = ((f['Throughput.1']-f['Throughput'])/f['Throughput.1'])*100.0
f['percentage.1'] = ((f['Throughput.2']-f['Throughput'])/f['Throughput.2'])*100.0
cols = f.columns.tolist()
#print cols
cols = ['BPS Profile', 'Throughput', 'Throughput.1', 'percentage', 'Throughput.2','percentage.1']
f = f[cols]
f.to_html('/root/madhu_test/bpstest/results/outnss.html')

在运行代码时,我得到如下输出: -

                                                    Test results for policy NSS-Tuned  \
BPS Profile        Throughput    BPS Profile                               Throughput
SigTestHTTP21kBin  216.966666667 SigTestHTTP21kBin                              219.1
SigTestHTTP21kHtml 359.433333333 SigTestHTTP21kHtml                             355.6
SigTestHTTP21kText 379.95        SigTestHTTP21kText                             377.9
NSS-HTTP21Kdelay   378.15        NSS-HTTP21Kdelay                              381.15
NSS-HTTPCPS        18920         NSS-HTTPCPS                                     6599
SIggTestPerimeter  270.233333333 SIggTestPerimeter                      243.433333333
SIgTestDatacenter  370.825       SIgTestDatacenter                             380.24
NSS-Financial      5             NSS-Financial                  BPS-FINANCIAL\t56.345
NSS-Education      971.125       NSS-Education                                  950.4
NSS-EuroMobile     920.68        NSS-EuroMobile                              1001.075
NSS-USMobile       528.2         NSS-USMobile                                   570.6

                                                    Test results for policy NSS-Tuned.1  \
BPS Profile        Throughput    BPS Profile                                BPS Profile
SigTestHTTP21kBin  216.966666667 SigTestHTTP21kBin             BPSHTTP21KBINARY\t219.16
SigTestHTTP21kHtml 359.433333333 SigTestHTTP21kHtml             BPS-HTTP21K-HTML\t364.0
SigTestHTTP21kText 379.95        SigTestHTTP21kText            BPS-HTTP21K-TEXT\t376.25
NSS-HTTP21Kdelay   378.15        NSS-HTTP21Kdelay              BPS-HTTP21K-DELAY\t380.2
NSS-HTTPCPS        18920         NSS-HTTPCPS                 BPS-HTTPCPS\t74.6522222222
SIggTestPerimeter  270.233333333 SIggTestPerimeter                 BPS-PERIMETER\t222.8
SIgTestDatacenter  370.825       SIgTestDatacenter              BPS-DATACENTER\t373.275
NSS-Financial      5             NSS-Financial                                     None
NSS-Education      971.125       NSS-Education                    BPS-EDUCATION\t1010.2
NSS-EuroMobile     920.68        NSS-EuroMobile                 BPS-EUROMOBILE\t932.525
NSS-USMobile       528.2         NSS-USMobile                       BPS-USMOBILE\t541.9

                                                    Test results for policy NSS-Tuned.2
BPS Profile        Throughput    BPS Profile                                 Throughput
SigTestHTTP21kBin  216.966666667 SigTestHTTP21kBin                                 None
SigTestHTTP21kHtml 359.433333333 SigTestHTTP21kHtml                                None
SigTestHTTP21kText 379.95        SigTestHTTP21kText                                None
NSS-HTTP21Kdelay   378.15        NSS-HTTP21Kdelay                                  None
NSS-HTTPCPS        18920         NSS-HTTPCPS                                       None
SIggTestPerimeter  270.233333333 SIggTestPerimeter                                 None
SIgTestDatacenter  370.825       SIgTestDatacenter                                 None
NSS-Financial      5             NSS-Financial                                     None
NSS-Education      971.125       NSS-Education                                     None
NSS-EuroMobile     920.68        NSS-EuroMobile                                    None
NSS-USMobile       528.2         NSS-USMobile                                      None
['Test results for policy NSS-Tuned', 'Test results for policy NSS-Tuned.1', 'Test results for policy NSS-Tuned.2']

如何将其分为6个列,例如['BPS Profile','Throughput','Throughput.1','percentage','Throughput.2','percentage.1']

如果我从文本文件中删除以下内容

                2464                                            2480                                            2481

Test results for policy NSS-Tuned               Test results for policy NSS-Tuned               Test results for policy NSS-Tuned
然后,Pandas数据帧将其正确分为6列。

我理解跳过会忽略行,但在生成的最终HTML文件中我也需要这些数据:

              2464                                            2480                                            2481

Test results for policy NSS-Tuned               Test results for policy NSS-Tuned               Test results for policy NSS-Tuned

1 个答案:

答案 0 :(得分:0)

如果我理解你的问题,我会跳过标题,吸收数据,然后手工设置列标题......

df = pd.read_csv(data.csv, skiprows=7, header=None, delimiter='\t+')
df.columns = ['BPS Profile', 'Throughput', 'BPS Profile.1', 'Throughput.1', 
    'BPS Profile.2', 'Throughput.2']

从这里可以很容易地操纵桌子......