Question

这与问题here非常相似，但我想知道大熊猫是否有一种干净的方式让工作日知道TimedeltaIndex？最终，我试图获取DatetimeIndex和Timestamp之间的工作日数（没有假日日历）。根据引用的问题，这样的工作

import pandas as pd
import numpy as np
drg = pd.date_range('2015-07-31', '2015-08-05', freq='B')
A = [d.date() for d in drg]
B = pd.Timestamp('2015-08-05', 'B').date()
np.busday_count(A, B)

给出了

array([3, 2, 1, 0], dtype=int64)

但这看起来有些笨拙。如果我尝试像

那样的话

drg - pd.Timestamp('2015-08-05', 'B')

我获得了TimedeltaIndex，但工作日频率被删除

TimedeltaIndex(['-5 days', '-2 days', '-1 days', '0 days'], dtype='timedelta64[ns]', freq=None)

只是想知道是否有更优雅的方式来解决这个问题。

Answer 1

private String getPageContent(String url) throws Exception { URL obj = new URL( url ); HttpURLConnection conn = (HttpURLConnection) obj.openConnection(); // default is GET conn.setRequestMethod("GET"); conn.setUseCaches(false); conn.setRequestProperty("User-Agent", AGENT); conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); conn.setRequestProperty("Accept-Language", "en-US,en;q=0.8"); if (cookies != null) { for (String cookie : this.cookies) { conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]); } } int responseCode = conn.getResponseCode(); MainActivity.globalV.responsecode = responseCode; BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream())); String inputLine; StringBuffer response = new StringBuffer(); while ((inputLine = in.readLine()) != null) { response.append(inputLine); } in.close(); // Get the response cookies setCookies(conn.getHeaderFields().get("Set-Cookie")); MainActivity.globalV.getPageContent = responseCode; return response.toString(); } public String getFormParams(String html, String username, String password) throws UnsupportedEncodingException { Document doc = Jsoup.parse(html); Elements inputElements = doc.getElementsByTag("input"); List<String> paramList = new ArrayList<String>(); for (Element inputElement : inputElements) { String key = inputElement.attr("name"); String value = inputElement.attr("value"); if (key.equals("user_username")) value = username; else if (key.equals("user_password")) value = password; paramList.add(key + "=" + URLEncoder.encode(value, "UTF-8")); } // build parameters list StringBuilder result = new StringBuilder(); for (String param : paramList) { if (result.length() == 0) { result.append(param); } else { result.append("&" + param); } } return result.toString(); } private void sendPost(String url, String postParams) throws Exception { URL obj = new URL( url ); HttpURLConnection conn = ( HttpsURLConnection) obj.openConnection(); conn.setUseCaches(false); conn.setRequestMethod("POST"); conn.setRequestProperty("Host", "host website"); conn.setRequestProperty("User-Agent", AGENT); conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); conn.setRequestProperty("Accept-Language", "en-US,en;q=0.8"); for (String cookie : this.cookies) { conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]); } conn.setRequestProperty("Connection", "keep-alive"); conn.setRequestProperty("Referer", "referer url"); conn.setRequestProperty("Content-Type", "text/html"); conn.setRequestProperty("Content-Length", Integer.toString(postParams.length())); conn.setDoOutput(true); conn.setDoInput(true); // Send post request DataOutputStream wr = new DataOutputStream(conn.getOutputStream()); wr.writeBytes(postParams); wr.flush(); wr.close(); MainActivity.globalV.sendPost = conn.getResponseCode(); BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream())); String inputLine; StringBuffer response = new StringBuffer(); while ((inputLine = in.readLine()) != null) { response.append(inputLine); } in.close(); }表示固定的时间跨度。可以将它们添加到Pandas时间戳中以按固定金额递增它们。他们的行为决不取决于时间戳是否是营业日。 TimedeltaIndex本身并不是商业日。

由于最终目标是计算DatetimeIndex和Timestamp之间的天数，因此我会查看另一个方向，而不是转换为TimedeltaIndex。

不幸的是，日期计算相当复杂，并且已经涌现出许多数据结构来处理它们 - Python TimedeltaIndex，datetime.dates s，Pandas datetime.datetime，NumPy {{ 1}}秒。

他们各自都有自己的优势，但没有一个人能够胜任各种目的。至利用他们的优势，有时需要进行转换这些类型。

要使用Timestamps，您需要将DatetimeIndex和Timestamp转换为某种类型datetime64可以理解。你称之为kludginess的是代码需要转换类型。假设我们想要使用np.busday_count，我们无法解决这个问题 - 我知道这项工作没有比np.busday_count更好的工具。

所以，虽然我认为没有更简洁的方式来计算工作日比你提出的方法，有一个更高效的方式：转换为np.busday_count而不是Python np.busday_count对象：

datetime64[D]

对于上面的示例（datetime.date接近4000），这个速度提高了100倍以上：

import pandas as pd
import numpy as np
drg = pd.date_range('2000-07-31', '2015-08-05', freq='B')
timestamp = pd.Timestamp('2015-08-05', 'B')

def using_astype(drg, timestamp):
    A = drg.values.astype('<M8[D]')
    B = timestamp.asm8.astype('<M8[D]')
    return np.busday_count(A, B)

def using_datetimes(drg, timestamp):
    A = [d.date() for d in drg]
    B = pd.Timestamp('2015-08-05', 'B').date()
    return np.busday_count(A, B)

无论如何，

len(drg)会将其输入转换为In [88]: %timeit using_astype(drg, timestamp) 10000 loops, best of 3: 95.4 µs per loop In [89]: %timeit using_datetimes(drg, timestamp) 100 loops, best of 3: 10.3 ms per loop，因此避免与np.busday_count进行额外转换会更有效。

Pandas DatetimeIndex与时间戳

1 个答案: