我有以下数据并尝试删除dateTime列中"!ENDMSG!"
所有的行
dateTime tradePrice tradeVolume aggVolume bid1 ask1 quote_counter
33501 2017-09-19 15:59:53 12545.5 1.0 54344.0 12545.0 12545.5 1567101.0
33502 2017-09-19 15:59:59 12545.0 1.0 54345.0 12545.0 12545.5 1567136.0
33503 2017-09-19 16:00:00 12545.0 1.0 54346.0 12544.5 12545.0 1567146.0
33504 2017-09-19 16:03:44 12544.0 17.0 54363.0 12544.0 12544.0 1567519.0
33505 !ENDMSG! NaN NaN NaN NaN NaN NaN
问题是dateTime
列的类型为object
,因此 ALL 以下选项失败:
df[df.dateTime.str.contains('!ENDMSDG!') == False]
df[~df.dateTime.str.contains("!")]
df = df[~df['dateTime'].str.contains('!ENDMSDG!')]
print(df[df['dateTime'].str.match('!ENDMSDG!')])#another try to catch it
感谢您的建议!
答案 0 :(得分:1)
似乎需要将!ENDMSDG!
更改为!ENDMSG!
(而不是第二D
封信件):
df = df[~df['dateTime'].str.contains('!ENDMSG!')]
df = df[~df['dateTime'].str.contains('!')]
#alternative with disable regex
#df = df[~df['dateTime'].str.contains('!', regex=False)]
如果需要检查字符串的开头:
df = df[~df['dateTime'].str.startswith('!')]
print (df)
dateTime tradePrice tradeVolume aggVolume bid1 \
33501 2017-09-19 15:59:53 12545.5 1.0 54344.0 12545.0
33502 2017-09-19 15:59:59 12545.0 1.0 54345.0 12545.0
33503 2017-09-19 16:00:00 12545.0 1.0 54346.0 12544.5
33504 2017-09-19 16:03:44 12544.0 17.0 54363.0 12544.0
ask1 quote_counter
33501 12545.5 1567101.0
33502 12545.5 1567136.0
33503 12545.0 1567146.0
33504 12544.0 1567519.0