我有这个数据集。
Route STOP_ID AveOn AveOff AveLd PassingTime Period DAYCODE PATTERN_ID BLK RTE DIR PATTERN_QUALITY VEHICLE_ID STOP_TYPE DWELL_SEC DOOR_OPEN_SEC
0 65 9605 2.1 0 24.2 0.3625 AM 0 11065088 6513 65 N 100 3607 ST 0 0
1 65 9605 2.1 0 24.2 0.3625 AM 0 11065088 6513 65 N 100 3608 ST 0 0
2 65 9605 2.1 0 24.2 0.3625 AM 0 11065088 6513 65 N 100 3664 ST 0 0
3 65 9605 2.1 0 24.2 0.3625 AM 0 11065088 6513 65 N 100 3608 ST 0 0
4 65 9605 2.1 0 24.2 0.3625 AM 0 11065088 6513 65 N 100 3669 ST 0 0
5 65 9605 2.1 0 24.2 0.3625 AM 0 11065088 6513 65 N 100 3620 ST 0 0
2185 67 35322 8.2 0.2 8 0.318055556 AM 0 20067078 6515 67 S 95 3613 ST 1 1
2187 67 35322 8.2 0.2 8 0.318055556 AM 0 20067078 6515 67 S 95 3674 ST 1 1
3976 67 82237 0.2 0.1 6.6 0.692361111 PM 0 20067078 6508 67 S 95 3676 S 1 0
5203 67 35322 4.7 0 4.7 0.33125 AM 0 20067078 6511 67 S 100 3640 ST 1 1
6723 67 35322 7.5 0 7.5 0.369444444 AM 0 20067078 6507 67 S 100 3658 ST 1 1
6730 67 35322 7.5 0 7.5 0.369444444 AM 0 20067078 6507 67 S 100 3673 ST 1 1
我确实需要删除列和重复行,并删除列(DWELL_SEC)值= 0的值 我开始编写如下代码:
import pandas as pd
import numpy as np
from pandas import ExcelWriter
transit="C:\\Users\\Taqwa\\Desktop\\ttest.xlsx"
xlsx = pd.ExcelFile(transit)
df=pd.read_excel(transit,'Sheet1')
df.columns=df.columns.astype(str)
writer=ExcelWriter("C:\\Users\\Taqwa\\Desktop\\ttest2.xlsx")
df1 = df[df.DWELL_SEC != 0]
for name, sub_df in df.groupby("STOP_ID"):
sub_df.to_excel( writer, str(name))
writer.save()
任何人都可以提供帮助
答案 0 :(得分:1)
使用drop_duplicates
+任何布尔索引函数(我使用query
):
df = df.drop_duplicates().query('DWELL_SEC != 0')
df
Route STOP_ID AveOn AveOff AveLd PassingTime Period DAYCODE \
2185 67 35322 8.2 0.2 8.0 0.318056 AM 0
2187 67 35322 8.2 0.2 8.0 0.318056 AM 0
3976 67 82237 0.2 0.1 6.6 0.692361 PM 0
5203 67 35322 4.7 0.0 4.7 0.331250 AM 0
6723 67 35322 7.5 0.0 7.5 0.369444 AM 0
6730 67 35322 7.5 0.0 7.5 0.369444 AM 0
PATTERN_ID BLK RTE DIR PATTERN_QUALITY VEHICLE_ID STOP_TYPE \
2185 20067078 6515 67 S 95 3613 ST
2187 20067078 6515 67 S 95 3674 ST
3976 20067078 6508 67 S 95 3676 S
5203 20067078 6511 67 S 100 3640 ST
6723 20067078 6507 67 S 100 3658 ST
6730 20067078 6507 67 S 100 3673 ST
DWELL_SEC DOOR_OPEN_SEC
2185 1 1
2187 1 1
3976 1 0
5203 1 1
6723 1 1
6730 1 1
如果DWELL_SEC
是字符串列,请先转换它:
df.DWELL_SEC = df.DWELL_SEC.astype(int)
df = df.drop_duplicates().query('DWELL_SEC != 0')
答案 1 :(得分:0)
import pandas as pd
import numpy as np
from pandas import ExcelWriter
transit="C:\\Users\\Taqwa\\Desktop\\ttest.xlsx"
xlsx = pd.ExcelFile(transit)
df=pd.read_excel(transit,'Sheet1')
df.DWELL_SEC = df.DWELL_SEC.astype(int)
df = df.drop_duplicates().query('DWELL_SEC != 0')
df.columns=df.columns.astype(str)
df1=df[['AveOn','AveOff','AveLd','DWELL_SEC','STOP_ID']]
df2=df1[['AveOn','AveOff','AveLd','DWELL_SEC','STOP_ID']].drop_duplicates()
writer=ExcelWriter("C:\\Users\\Taqwa\\Desktop\\ttest2.xlsx")
for name, sub_df2 in df2.groupby("STOP_ID"):
sub_df2.to_excel( writer, str(name))
writer.save()