熊猫read_csv以float类型读取列中的列表

时间:2020-07-19 14:10:14

标签: python pandas csv

我有以下CSV文件(已缩短):

    "","PacketTime","FrameLen","FrameCapLen","IPHdrLen","IPLen","TCPLen","TCPHdrLen","WindowSize","WindowSizeValue","BytesInFlight","PushBytesSent","ACKRTT","Payload","TLSRecordContentType","TLSRecordLen","TLSAppData","Movement","Distance","Speed","Delay","Loss","Interval"
"1",0.056078384,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:89:da:a1:d8:d8:a5:9b:38:29:9e:0e:b1:51:c9:a0:e1:66:af:57:e2:a2:e6:c1:16:16:eb:2e:26:02:ec:f6:4e:f7:90:05:20:3c:45:61:14:0d:c4:c3:df:2e",23,45,"89:da:a1:d8:d8:a5:9b:38:29:9e:0e:b1:51:c9:a0:e1:66:af:57:e2:a2:e6:c1:16:16:eb:2e:26:02:ec:f6:4e:f7:90:05:20:3c:45:61:14:0d:c4:c3:df:2e",1,1,25,0,0,"0"
"2",0.056106291,66,66,20,52,0,32,84,84,NA,NA,2.7907e-05,NA,NA,NA,NA,1,1,25,0,0,"0"
"3",2.058089106,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:ba:92:5d:6a:18:1e:d5:89:6a:6a:a3:f7:5a:cf:dd:4d:f8:38:1f:4b:ad:1b:3f:94:8a:07:fa:9b:27:c8:06:34:cd:10:a3:08:d0:db:01:42:2b:2d:27:fa:dd",23,45,"ba:92:5d:6a:18:1e:d5:89:6a:6a:a3:f7:5a:cf:dd:4d:f8:38:1f:4b:ad:1b:3f:94:8a:07:fa:9b:27:c8:06:34:cd:10:a3:08:d0:db:01:42:2b:2d:27:fa:dd",1,1,25,0,0,"2"
"4",2.058114719,66,66,20,52,0,32,84,84,NA,NA,2.5613e-05,NA,NA,NA,NA,1,1,25,0,0,"2"
"5",4.060316193,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:c5:5d:a0:5d:7c:6f:4e:70:31:18:0d:a2:0b:ac:dd:19:18:59:4d:3e:d7:f4:a6:92:5d:4e:98:4e:ed:ae:5a:d2:e8:cd:d2:83:b0:82:91:48:88:0e:d4:ed:09",23,45,"c5:5d:a0:5d:7c:6f:4e:70:31:18:0d:a2:0b:ac:dd:19:18:59:4d:3e:d7:f4:a6:92:5d:4e:98:4e:ed:ae:5a:d2:e8:cd:d2:83:b0:82:91:48:88:0e:d4:ed:09",1,1,25,0,0,"4"
"6",4.060340382,66,66,20,52,0,32,84,84,NA,NA,2.4189e-05,NA,NA,NA,NA,1,1,25,0,0,"4"
"7",6.063347757,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:7e:cc:44:54:43:45:71:c6:5f:75:15:ad:f7:ce:81:a6:31:51:ce:76:0a:03:52:60:72:fc:17:9f:be:f7:92:06:b6:80:64:38:0d:6f:6e:a0:df:ea:b9:16:8e",23,45,"7e:cc:44:54:43:45:71:c6:5f:75:15:ad:f7:ce:81:a6:31:51:ce:76:0a:03:52:60:72:fc:17:9f:be:f7:92:06:b6:80:64:38:0d:6f:6e:a0:df:ea:b9:16:8e",1,1,25,0,0,"6"
"8",6.06337245,66,66,20,52,0,32,84,84,NA,NA,2.4693e-05,NA,NA,NA,NA,1,1,25,0,0,"6"
"9",8.065573696,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:e3:07:5a:eb:d6:b7:3b:55:6b:77:57:99:76:fa:f4:43:38:34:d4:82:60:40:10:eb:90:2a:01:14:21:aa:db:a0:d3:c4:eb:6a:e8:08:05:4e:59:ca:67:f1:63",23,45,"e3:07:5a:eb:d6:b7:3b:55:6b:77:57:99:76:fa:f4:43:38:34:d4:82:60:40:10:eb:90:2a:01:14:21:aa:db:a0:d3:c4:eb:6a:e8:08:05:4e:59:ca:67:f1:63",1,1,25,0,0,"8"
"10",8.065602121,66,66,20,52,0,32,84,84,NA,NA,2.8425e-05,NA,NA,NA,NA,1,1,25,0,0,"8"
"11",10.066978328,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:d2:2e:ed:cc:21:12:20:66:cb:6d:41:5f:5a:b8:ea:53:2d:7a:ff:f7:ca:07:91:07:64:51:a4:91:6e:28:58:6f:17:29:8d:7f:2c:ca:c4:22:a7:81:d9:af:3c",23,45,"d2:2e:ed:cc:21:12:20:66:cb:6d:41:5f:5a:b8:ea:53:2d:7a:ff:f7:ca:07:91:07:64:51:a4:91:6e:28:58:6f:17:29:8d:7f:2c:ca:c4:22:a7:81:d9:af:3c",1,1,25,0,0,"10"
"12",10.067001964,66,66,20,52,0,32,84,84,NA,NA,2.3636e-05,NA,NA,NA,NA,1,1,25,0,0,"10"
"13",12.069526007,116,116,20,102,50,32,83,83,50,50,NA,"17:03:03:00:2d:6b:4e:48:e6:ce:0b:f5:2c:18:df:36:c1:08:56:7a:f1:5e:be:f5:8a:e2:b7:84:87:30:66:c9:de:60:ac:4a:ad:80:4b:44:64:3b:21:96:18:c7:42:c8:03:20",23,45,"6b:4e:48:e6:ce:0b:f5:2c:18:df:36:c1:08:56:7a:f1:5e:be:f5:8a:e2:b7:84:87:30:66:c9:de:60:ac:4a:ad:80:4b:44:64:3b:21:96:18:c7:42:c8:03:20",1,1,25,0,0,"12"
"14",12.069551287,66,66,20,52,0,32,84,84,NA,NA,2.528e-05,NA,NA,NA,NA,1,1,25,0,0,"12"

现在,当我致电read_csv时,请执行以下操作:

# min max for the packet time
def min_max(s):
    s = s.astype('float64')
    return s.max()-s.min()

def to_list(df):
    return df.T.apply(lambda x: x.to_list(), axis='columns')

def group(csv):
    df = pd.read_csv(csv)

    df_other = df.groupby('Interval')\
            .apply(to_list)\
            .drop(columns='PacketTime')
    s_Interval = df.groupby('Interval')['PacketTime']\
            .apply(min_max)
    final_df = pd.concat([df_other,s_Interval], axis= 'columns')
    final_df.drop(['Unnamed: 0'], axis=1, inplace=True)

    return final_d

dataset = group("csv_location")
dataset.drop(['Interval'], axis=1, inplace=True)

ptime = dataset.pop('PacketTime')

target = dataset.pop('Movement')
other_targets = pd.DataFrame([dataset.pop(x) for x in ['Distance', 'Speed', 'Delay', 'Loss']])

但是,当我遍历数据集的列时-您会注意到这些列包含列表-第一列似乎是<class 'list'>,但是第二列是<class 'float'>

这是我正在做的循环:

columns = list(dataset)
for col in columns:
    df = pd.DataFrame(dataset[col].astype('list').tolist())
    df.columns = [col+"_"+str(y) for y in range(len(df.columns))]
    df = df.dropna(axis='columns')
    dataset.drop(col, axis=1, inplace=True)
    dataset = pd.concat([dataset, df], axis=1)

我得到的错误是df = pd.DataFrame(dataset[col].tolist())行的错误:

TypeError: object of type 'float' has no len()

0 个答案:

没有答案