我创建了一个while循环,该循环将文件路径与pandas数据帧的file / exe列分开,并将文件路径放入新列中。
#Count rows
rows = len(DF1)
#While loop to grab file path - new column
index = 0
while (index < rows):
DF1['ParentPath'].iloc[index] = DF1['ParentPathExe'].iloc[index].rsplit('\\', 1)[0]
DF1['ChildPath'].iloc[index] = DF1['ChildPathExe'].iloc[index].rsplit('\\', 1)[0]
index = index + 1
这行得通,但是在650万行上却非常慢。 file / exe列中填充了以下内容:
C:\Windows\System32\conhost.exe
C:\Windows\System32\svchost.exe
C:\Windows\System32\raserver\raserver.exe
有些文件路径有3个“ \”,有些有4,5,6个“ \”等。
我使用以下代码剥离.exe,这非常快。
#Strip out EXE into new column
DF1['ParentExe'] = DF1['ParentPathExe'].str.split('\\').str[-1]
DF1['ChildExe'] = DF1['ChildPathExe'].str.split('\\').str[-1]
有没有一种方法可以避免出现类似“ .exe”的外观?
答案 0 :(得分:0)
我重新编写了子行和父行,以使用rsplit分为文件路径和.exe:
#Split ParentPathExe into path and exe columns
Parent = DF1['ParentPathExe'].str.rsplit("\\", n=1, expand=True)
#Rename columns
Parent.columns = ['ParentPath', 'ParentExe']
ParentPath ParentExe
0 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
1 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
2 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
3 C:\Windows\System32 svchost.exe
4 C:\Program Files (x86)\Wireless AutoSwitch WrlsAutoSW.exs
#Split ChildPathExe into path and exe columns
Child = DF1['ChildPathExe'].str.rsplit("\\", n=1, expand=True)
#Rename columns
Child.columns = ['ChildPath', 'ChildExe']
ChildPath ChildExe
0 C:\Windows\System32 conhost.exe
1 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
2 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
3 C:\Program Files\Common Files\microsoft shared... OfficeC2RClient.exe
4 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
5 C:\Program Files (x86)\Wireless AutoSwitch wrlssw.exe
然后将两个数据框合并在一起:
DF1 = pd.concat([Parent, Child], axis = 1)