对于学校,我必须创建一个有关wifisignals的项目,并且我正在尝试将数据放入数据框。 有208.000行数据。 并且,当涉及以下代码时,该代码不会完成。代码就像陷入了无限循环。 但是,当我仅使用1000行时,我的程序就可以运行。所以我认为,如果可能的话,我的清单很小。 phython中是否存在更大的列表?还是因为我使用了错误的编码? 预先感谢。
编辑1: (数据是原始数据框,而wifiinfo是其中的一列) 我有这种格式:
df = pd.DataFrame(columns=['Sender','Time','Date','Place','X','Y','Bezetting','SSID','BSSID','Signal'])
我试图从列SSID
中填充BSSID
,Signal
和WifiInfo
,为此,我必须拆分数据。
这是1 WifiInfo的样子:
ODISEE@88-1d-fc-41-dc-50:-83,ODISEE@88-1d-fc-2c-c0-00:-72,ODISEE@88-1d-fc-41-d2-d0:-82,CiscoC5976@58-6d-8f-19-14-38:-78,CiscoC5959@58-6d-8f-19-13-f4:-93,SNB@c8-d7-19-6f-be-b7:-99,ODISEE@88-1d-fc-2c-c5-70:-94,HackingDemo@58-6d-8f-19-11-48:-156,ODISEE@88-1d-fc-30-d4-40:-85,ODISEE@88-1d-fc-41-ac-50:-100
我当前的方法如下:
for index, row in data.iterrows():
bezettingList = list()
ssidList = list()
bssidList = list()
signalList = list()
#WifiInfo splitting
wifis = row.WifiInfo.split(',')
for wifi in wifis:
#split wifi and add to List
ssid, bssid = wifi.split('@')
bssid, signal = bssid.split(':')
ssidList.append(ssid)
bssidList.append(bssid)
signalList.append(int(signal))
#add bezettingen to List
bezettingen = row.Bezetting.split(',')
for bezetting in bezettingen:
bezettingList.append(bezetting)
#add list to dataframe
df.loc[index,'SSID'] = ssidList
df.loc[index,'BSSID'] = bssidList
df.loc[index,'Signal'] = signalList
df.loc[index,'Bezetting'] = bezettingList
df.head()
答案 0 :(得分:0)
IIUC,您需要先用逗号将行炸开,这样:
SSID BSSID Signal WifiInfo
0 NaN NaN NaN ODISEE@88-1d-fc-41-dc-50:-83,ODISEE@88- ...
成为这个:
SSID BSSID Signal WifiInfo
0 NaN NaN NaN ODISEE@88-1d-fc-41-dc-50:-83
1 NaN NaN NaN ODISEE@88-1d-fc-2c-c0-00:-72
2 NaN NaN NaN ODISEE@88-1d-fc-41-d2-d0:-82
3 NaN NaN NaN CiscoC5976@58-6d-8f-19-14-38:-78
4 NaN NaN NaN CiscoC5959@58-6d-8f-19-13-f4:-93
5 NaN NaN NaN SNB@c8-d7-19-6f-be-b7:-99
6 NaN NaN NaN ODISEE@88-1d-fc-2c-c5-70:-94
7 NaN NaN NaN HackingDemo@58-6d-8f-19-11-48:-156
8 NaN NaN NaN ODISEE@88-1d-fc-30-d4-40:-85
9 NaN NaN NaN ODISEE@88-1d-fc-41-ac-50:-100
# use `.explode`
data = data.assign(WifiInfo=data.WifiInfo.str.split(',')).explode('WifiInfo')
现在您可以使用.str.extract
:
data['SSID'] = data['WifiInfo'].str.extract(r'(.*)@')
data['BSSID'] = data['WifiInfo'].str.extract(r'@(.*):')
data['Signal'] = data['WifiInfo'].str.extract(r':(.*)')
SSID BSSID Signal WifiInfo
0 ODISEE 88-1d-fc-41-dc-50 -83 ODISEE@88-1d-fc-41-dc-50:-83
1 ODISEE 88-1d-fc-2c-c0-00 -72 ODISEE@88-1d-fc-2c-c0-00:-72
2 ODISEE 88-1d-fc-41-d2-d0 -82 ODISEE@88-1d-fc-41-d2-d0:-82
3 CiscoC5976 58-6d-8f-19-14-38 -78 CiscoC5976@58-6d-8f-19-14-38:-78
4 CiscoC5959 58-6d-8f-19-13-f4 -93 CiscoC5959@58-6d-8f-19-13-f4:-93
5 SNB c8-d7-19-6f-be-b7 -99 SNB@c8-d7-19-6f-be-b7:-99
6 ODISEE 88-1d-fc-2c-c5-70 -94 ODISEE@88-1d-fc-2c-c5-70:-94
7 HackingDemo 58-6d-8f-19-11-48 -156 HackingDemo@58-6d-8f-19-11-48:-156
8 ODISEE 88-1d-fc-30-d4-40 -85 ODISEE@88-1d-fc-30-d4-40:-85
9 ODISEE 88-1d-fc-41-ac-50 -100 ODISEE@88-1d-fc-41-ac-50:-100
如果您想在列爆炸后保持数据分组,我将首先为每组条目分配一个ID:
data['Group'] = pd.factorize(data['WifiInfo'])[0]+1
SSID BSSID Signal WifiInfo Group
0 NaN NaN NaN ODISEE@88-1d-fc-41-dc-50:-83,ODISEE@88- ... 1
1 NaN NaN NaN ASD@22-1d-fc-41-dc-50:-83,QWERTY@88- ... 2
# after you explode the column
SSID BSSID Signal WifiInfo Group
ODISEE 88-1d-fc-41-dc-50 -83 ODISEE@88-1d-fc-41-dc-50:-83 1
ODISEE 88-1d-fc-2c-c0-00 -72 ODISEE@88-1d-fc-2c-c0-00:-72 1
...
...
ASD 22-1d-fc-41-dc-50 -83 ASD@88-1d-fc-41-dc-50:-83 2
QWERTY 88-1d-fc-2c-c0-00 -72 QWERTY@88-1d-fc-2c-c0-00:-72 2