愿我的例子变得更大,我的代码在这里:
Starting ChromeDriver 2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f) on port 32443
Only local connections are allowed.
Oct 23, 2017 1:36:09 PM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: OSS
Exception in thread "main" org.openqa.selenium.NoSuchElementException: no such element: Unable to locate element: {"method":"id","selector":"lead_field_import_email_address"}
(Session info: chrome=61.0.3163.100)
(Driver info: chromedriver=2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f),platform=Windows NT 10.0.15063 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 0 milliseconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: '3.6.0', revision: '6fbf3ec767', time: '2017-09-27T16:15:26.402Z'
System info: host: 'HOME-PC', ip: '192.235.0.1', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_151'
Driver info: org.openqa.selenium.chrome.ChromeDriver
Capabilities [{mobileEmulationEnabled=false, hasTouchScreen=false, platform=XP, acceptSslCerts=true, webStorageEnabled=true, browserName=chrome, takesScreenshot=true, javascriptEnabled=true, platformName=XP, setWindowRect=true, unexpectedAlertBehaviour=, applicationCacheEnabled=false, rotatable=false, networkConnectionEnabled=false, chrome={chromedriverVersion=2.33.506120 (e3e53437346286c0bc2d2dc9aa4915ba81d9023f), userDataDir=C:\Users\David\AppData\Local\Temp\1\scoped_dir5416_25737}, takesHeapSnapshot=true, pageLoadStrategy=normal, unhandledPromptBehavior=, databaseEnabled=false, handlesAlerts=true, version=61.0.3163.100, browserConnectionEnabled=false, nativeEvents=true, locationContextEnabled=true, cssSelectorsEnabled=true}]
Session ID: 40cde314a5a76400aceff8b625b38e3c
*** Element info: {Using=id, value=lead_field_import_email_address}
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:214)
at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:166)
at org.openqa.selenium.remote.http.JsonHttpResponseCodec.reconstructValue(JsonHttpResponseCodec.java:40)
at org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode(AbstractHttpResponseCodec.java:82)
at org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode(AbstractHttpResponseCodec.java:45)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:164)
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:586)
at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:356)
at org.openqa.selenium.remote.RemoteWebDriver.findElementById(RemoteWebDriver.java:402)
at org.openqa.selenium.By$ById.findElement(By.java:218)
at org.openqa.selenium.remote.RemoteWebDriver.findElement(RemoteWebDriver.java:348)
at newAutomation.importLeads.main(importLeads.java:33)
我想要做的是找到一年中每年的最后一天开始import pandas as pd
import numpy as np
import io
t = """
name date
a 2005-08-31
a 2005-09-20
a 2005-11-12
a 2005-12-31
a 2006-03-31
a 2006-06-25
a 2006-07-23
a 2006-09-28
a 2006-12-21
a 2006-12-27
a 2007-07-23
a 2007-09-21
a 2007-03-15
a 2008-04-12
a 2008-06-21
a 2008-06-11
b 2005-08-31
b 2005-09-23
b 2005-11-12
b 2005-12-31
b 2006-03-31
b 2006-06-25
b 2006-07-23
b 2006-09-28
b 2006-12-21
b 2006-12-27
b 2007-07-23
b 2007-09-21
b 2007-03-15
b 2008-04-12
b 2008-06-21
b 2008-06-11
"""
data=pd.read_csv(io.StringIO(t),delimiter=' ')#5 space here
data
)并结束2005-7-1
,开始2006-06-30
并结束2006-7-1
。 。等等 。
我的预期输出在这里:
2007-6-30
如何解决这个问题?我想我应该使用name date
a 2006-06-25 #the last day of the 2005/7/01 -2006/06/31
a 2007-03-15 #the last day of the 2006/7/01 -2007/06/31
a 2008-06-21 #the last day of the 2007/7/01 -2008/06/31
b 2006-06-25 #the last day of the 2005/7/01 -2006/06/31
b 2007-03-15 #the last day of the 2006/7/01 -2007/06/31
b 2008-06-21 #the last day of the 2007/7/01 -2008/06/31
答案 0 :(得分:5)
您可以使用单个groupby执行此操作而无需回滚:
In [11]: data.date = pd.to_datetime(data.date, format="%Y-%m-%d")
In [12]: df.groupby(["name", pd.Grouper(key="date", freq="AS-JUL")])["date"].max()
Out[12]:
name date
a 2005-07-01 2006-06-25
2006-07-01 2007-03-15
2007-07-01 2008-06-21
b 2005-07-01 2006-06-25
2006-07-01 2007-03-15
2007-07-01 2008-06-21
Name: date, dtype: datetime64[ns]
答案 1 :(得分:4)
我们首先会在每个月的开始(因为你在那里有一些糟糕的日期,让我们忽略它们)但关键是我们需要将它作为日期时间而不是字符串:< / p>
In [11]: pd.to_datetime(data.date.str[:7], format="%Y-%m") # to beginning of month
Out[11]:
0 2005-08-01
1 2005-09-01
2 2005-11-01
3 2005-12-01
...
In [12]: df.date = pd.to_datetime(data.date.str[:7], format="%Y-%m")
现在来了magic:
In [13]: from pandas.tseries.frequencies import to_offset
In [14]: df.date.map(to_offset("AS-JUL").rollback)
Out[14]:
0 2005-07-01
1 2005-07-01
2 2005-07-01
3 2005-07-01
4 2005-07-01
5 2005-07-01
6 2006-07-01
7 2006-07-01
8 2006-07-01
9 2006-07-01
10 2007-07-01
11 2007-07-01
12 2006-07-01
13 2007-07-01
14 2007-07-01
15 2007-07-01
16 2005-07-01
17 2005-07-01
18 2005-07-01
19 2005-07-01
20 2005-07-01
21 2005-07-01
22 2006-07-01
23 2006-07-01
24 2006-07-01
25 2006-07-01
26 2007-07-01
27 2007-07-01
28 2006-07-01
29 2007-07-01
30 2007-07-01
31 2007-07-01
Name: date, dtype: datetime64[ns]
我们创建了一个偏移到"AS-JUL"
并将其回滚(意思是楼层)
注意:无论出于何种原因,我们无法使用dt.floor
...
好的,误读了这一部分,你想要每个时期每组的最新记录日期,修正日期,最后一部分只是一个组:
In [21]: data.date = pd.to_datetime(data.date, format="%Y-%m-%d")
In [22]: data["period_start"] = data.date.map(to_offset("AS-JUL").rollback).dt.normalize()
In [23]: data.groupby(["name", "period_start"])["date"].max()
Out[23]:
name period_start
a 2005-07-01 2006-06-25
2006-07-01 2007-03-15
2007-07-01 2008-06-21
b 2005-07-01 2006-06-25
2006-07-01 2007-03-15
2007-07-01 2008-06-21
Name: date, dtype: datetime64[ns]
答案 2 :(得分:3)
从美丽的功能to_offset
@Andy建议我们可以做到
from pandas.tseries.frequencies import to_offset
new = data.groupby('name').apply(lambda x : x.groupby(x['date'].map(to_offset("AS-JUL"))).max())
name date name date a 2006-07-01 a 2006-06-25 2007-07-01 a 2007-03-15 2008-07-01 a 2008-06-21 b 2006-07-01 b 2006-06-25 2007-07-01 b 2007-03-15 2008-07-01 b 2008-06-21
答案 3 :(得分:3)
使用IntervalIndex
(DF
是您的DataFrame
)
idx=pd.IntervalIndex.from_arrays(pd.date_range(start='2005-07-01',freq='12MS',periods=12),pd.date_range(start='2006-06-30',freq='12M',periods=12),closed='both')
df=pd.DataFrame({'G':list(range(len(idx)))},index=idx)
DF.date=pd.to_datetime(DF.date)
DF['G']=df.loc[DF.date].values
DF.sort_values(['name','date']).drop_duplicates(['name','G'],keep='last')
Out[19]:
name date G
5 a 2006-06-25 0
12 a 2007-03-15 1
14 a 2008-06-21 2
21 b 2006-06-25 0
28 b 2007-03-15 1
30 b 2008-06-21 2