我是这样的DF
>>> [1,2,3,4,4,5].count(4)
2
>>> '1234544'.count('4')
3
当Interval具有此特定格式时,我需要删除整行
UNIT EXITSn_hourly Interval
1867 R081 104 00:00:00-04:00:00
1868 R081 0 04:00:00-04:00:00
1869 R081 129 04:00:00-08:00:00
1870 R081 521 08:00:00-12:00:00
1871 R081 1048 12:00:00-16:00:00
2838 R032 38 00:00:00-04:00:00
2839 R032 0 04:00:00-04:00:00
2840 R032 89 04:00:00-08:00:00
2841 R032 470 08:00:00-12:00:00
我不仅要删除1868 R081 0 04:00:00-04:00:00
,还要删除像
04:00:00-04:00:00
实际上这是我原来的df。我创建了一个Interval
01:00:00-01:00:00
我使用此代码创建了间隔
C/A UNIT SCP DATEn TIMEn DESCn ENTRIESn EXITSn
0 A002 R051 02-00-00 06-29-13 00:00:00 REGULAR 4174592 1433672
1 A002 R051 02-00-00 06-29-13 04:00:00 REGULAR 4174628 1433675
2 A002 R051 02-00-00 06-29-13 08:00:00 REGULAR 4174641 1433706
3 A002 R051 02-00-00 06-29-13 12:00:00 REGULAR 4174741 1433775
4 A002 R051 02-00-00 06-29-13 16:00:00 REGULAR 4174936 1433826
5 A002 R051 02-00-00 06-29-13 20:00:00 REGULAR 4175270 1433877
6 A002 R051 02-00-00 06-30-13 00:00:00 REGULAR 4175403 1433908
7 A002 R051 02-00-00 06-30-13 04:00:00 REGULAR 4175441 1433914
8 A002 R051 02-00-00 06-30-13 08:00:00 REGULAR 4175457 1433928
9 A002 R051 02-00-00 06-30-13 12:00:00 REGULAR 4175520 1433981
答案 0 :(得分:0)
可能你想将Interval分成Interval_start和Interval_end并检查它们是否相等:
df['Interval_start'] = df['Interval'].map(lambda s: s.split('-')[0])
df['Interval_end'] = df['Interval'].map(lambda s: s.split('-')[1])
df.query("Interval_start != Interval_end")
UNIT EXITSn_hourly Interval Interval_start Interval_end
1867 R081 104 00:00:00-04:00:00 00:00:00 04:00:00
1869 R081 129 04:00:00-08:00:00 04:00:00 08:00:00
1870 R081 521 08:00:00-12:00:00 08:00:00 12:00:00
1871 R081 1048 12:00:00-16:00:00 12:00:00 16:00:00
2838 R032 38 00:00:00-04:00:00 00:00:00 04:00:00
2840 R032 89 04:00:00-08:00:00 04:00:00 08:00:00
2841 R032 470 08:00:00-12:00:00 08:00:00 12:00:00
答案 1 :(得分:0)
您可以比较字符串的各个部分,然后按子集删除它们:
print df.Interval.str[0:2]
1867 00
1868 04
1869 04
1870 08
1871 12
2838 00
2839 04
2840 04
2841 08
Name: Interval, dtype: object
print df.Interval.str[0:2] != df.Interval.str[9:11]
1867 True
1868 False
1869 True
1870 True
1871 True
2838 True
2839 False
2840 True
2841 True
Name: Interval, dtype: bool
print df[df.Interval.str[0:2] != df.Interval.str[9:11]]
UNIT EXITSn_hourly Interval
1867 R081 104 00:00:00-04:00:00
1869 R081 129 04:00:00-08:00:00
1870 R081 521 08:00:00-12:00:00
1871 R081 1048 12:00:00-16:00:00
2838 R032 38 00:00:00-04:00:00
2840 R032 89 04:00:00-08:00:00
2841 R032 470 08:00:00-12:00:00
编辑:
我检查了您的代码,也许您可以省略copy.deepcopy
并使用copy
:
df = turnstile_data.copy(deep=True)
df['ENTRIESn_hourly'] = (df['ENTRIESn'] - df['ENTRIESn'].shift(periods=1)).fillna(0)
df['EXITSn_hourly'] = (df['EXITSn'] - df['EXITSn'].shift(periods=1)).fillna(0)
df['Interval'] = (df['TIMEn'].shift(periods=1)+'-'+ df['TIMEn']).fillna(0)
df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
df.loc[(df['C/A'] != df['C/A'].shift(periods=1)) |
(df['UNIT'] != df['UNIT'].shift(periods=1)) |
(df['SCP'] != df['SCP'].shift(periods=1)),
['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0
print df.head(5)
ENTRIESn_hourly EXITSn_hourly Interval
0 0 0 0
1 36 3 00:00:00-04:00:00
2 13 31 04:00:00-08:00:00
3 100 69 08:00:00-12:00:00
4 195 51 12:00:00-16:00:00
required_df=df[['UNIT','EXITSn_hourly','Interval']].groupby(df.UNIT)
print required_df.head(5)
UNIT EXITSn_hourly Interval
0 R051 0 0
1 R051 3 00:00:00-04:00:00
2 R051 31 04:00:00-08:00:00
3 R051 69 08:00:00-12:00:00
4 R051 51 12:00:00-16:00:00