我正在尝试构建一个面板数据数据框架,它由周期性和“连续”的每日数据组成,这些数据应该相互分配,这样新数据框的每一行都有,周期,值对于周期性数据以及该期间中某天的价值和日期,数据看起来类似于:
> dailycds
Date CDS
1 30-06-2015 194
2 01-07-2015 195
3 02-07-2015 198
4 03-07-2015 198
5 04-07-2015 199
6 30-06-2016 165
7 01-07-2016 172
8 02-07-2016 213
9 03-07-2016 123
10 04-07-2016 321
> periodicassets
Period Assets
1 201506 1314
2 201606 2134
最终,我希望它看起来像这样:
> df
Period Date Assets CDS
1 201506 30-06-2015 1314 194
2 201506 01-07-2015 1314 195
3 201506 02-07-2015 1314 198
4 201506 03-07-2015 1314 198
5 201606 30-06-2016 2134 165
6 201606 01-07-2016 2134 172
7 201606 02-07-2016 2134 213
8 201606 03-07-2016 2134 123
基本上,我们的想法是从日常数据中获取某些行范围,并将它们分配(并合并)到周期性数据中。不幸的是,我不能简单地通过提取日期的mm-yyyy部分来做到这一点,因为201506年期间也包含7月至第三期的数据,而第四期与没有期间相关且应该被删除,因为每个期间应该只包含一定天数(在这种情况下为4)。
这是以上示例数据的代码:
dailycds = data.frame(Date = c("30-06-2015", "01-07-2015", "02-07-2015","03-07-2015","04-07-2015","30-06-2016", "01-07-2016", "02-07-2016","03-07-2016","04-07-2016"),
CDS = c(194, 195, 198,198,199,165,172,213,123,321))
dailycds
periodicassets = data.frame(Period = c("201506", "201606"),
Assets = c("1314","2134"))
periodicassets
df = data.frame(Period = c("201506", "201506", "201506", "201506", "201606", "201606", "201606", "201606"),
Date = c("30-06-2015", "01-07-2015", "02-07-2015","03-07-2015", "30-06-2016", "01-07-2016", "02-07-2016", "03-07-2016"),
Assets = c("1314", "1314", "1314", "1314", "2134", "2134", "2134", "2134"),
CDS = c(194, 195, 198, 198, 165, 172, 213, 123))
正如给定解决方案中所建议的那样,我之前的示例非常具体,可能过于简化。因此,为了更接近我的问题,这里有一些额外的背景: 最终,定期数据是指银行资产的月末持有量,我希望在每月结束前3天和之后6天的时间内(例如)分配每日CDS数据。因此,在小组中当然有多个银行,每个银行必须将(相同的)CDS数据分配给其持有。 (例如,如果我有3个银行,我需要3天前和6个月末,我有(3 + 1 + 6)* 2天。)正如评论中指出的,我总是指商业/工作在我的问题中,因为我的时间序列不包含任何假期等。
所以,为了公正地对待这个问题,这里只有一段时间的原始片段:
> periodicassets
BankName Period value
2 BPCE 201412 112189.50
4 Credit Agricole 201412 81618.76
Date CDS
<dttm> <chr>
1 2015-01-12 46.869
2 2015-01-09 48.121000000000002
3 2015-01-08 48.625999999999998
4 2015-01-07 48.801000000000002
5 2015-01-06 48.633000000000003
6 2015-01-05 46.670999999999999
7 2015-01-02 45.158000000000001
8 2015-01-01 47.32
9 2014-12-31 47.658000000000001
10 2014-12-30 45.843000000000004
11 2014-12-29 47.588999999999999
12 2014-12-26 47.625999999999998
13 2014-12-25 47.697000000000003
14 2014-12-24 47.414999999999999
15 2014-12-23 48.075000000000003
16 2014-12-22 48.085999999999999
17 2014-12-19 47.496000000000002
18 2014-12-18 46.534999999999997
19 2014-12-17 48.149000000000001
可在此处访问:periodic assets, dailycds
在浏览论坛时,我发现了类似的问题,例如: create an index for aggregating daily data to match periodic data和 但是,create an index for aggregating daily data to match periodic data,当第一个尝试聚合数据时,第二个已经拥有我想要的格式(在对象xtime中)。
答案 0 :(得分:2)
此问题的关键问题是Period
如何映射到Date
。根据OP的解释,我了解到每个时期包括实际月份的最后一天加上下个月的前三天,总共4天。
这可以通过一些日期算术和右连接来解决:
library(data.table)
result <-
# coerce to data.table
setDT(dailycds)[
# compute period by subtracting 3 days of date
, Period := format(as.IDate(Date, "%d-%m-%Y") - 3L, "%Y%m")][
# right join, dropping all rows from dailycds without matching period
periodicassets, on = "Period"][
# change column order to be in line with expected result df
, setcolorder(.SD, names(df))]
result
Period Date Assets CDS 1: 201506 30-06-2015 1314 194 2: 201506 01-07-2015 1314 195 3: 201506 02-07-2015 1314 198 4: 201506 03-07-2015 1314 198 5: 201606 30-06-2016 2134 165 6: 201606 01-07-2016 2134 172 7: 201606 02-07-2016 2134 213 8: 201606 03-07-2016 2134 123
请求每个时段只有4行,结果与预期结果df
一致:
all.equal(df, as.data.frame(result[, lapply(.SD, forcats::fct_drop)]))
[1] TRUE
必须删除未使用的级别才能通过all.equal()
代码已经过测试,可以使用提供的示例数据。如果是连续的每日数据和定期数据,可能需要添加代码以删除不属于4天期限的日期。
OP已更新了他的问题,并通过 dropbox 提供更实际的样本数据。现在,dailycds
包含每日数据(周末除外)。正如上面警告中已经提到的,这需要在相关日期过滤dailycds
。
OP尚不清楚如何定义月末之前和之后的日期。在这里,我们假设月末前3天和之后6天是指日历天而不是业务天。
# define day range of interest relativ to turn of the month
days_before <- 3L
days_after <- 6L
stopifnot(days_before + days_after < 28)
# read data from dropbox links, note ?dl=1
dailycds <- readRDS(url("https://www.dropbox.com/s/r7v5dq6la0mnn71/dailycds.RDS?dl=1"))
periodicassets <-
readRDS(url("https://www.dropbox.com/s/gdflcngwp8nm552/periodicassets.RDS?dl=1"))
library(data.table)
# coerce to data.table
setDT(dailycds)[
# filter calendar dates
mday(Date) <= days_after | mday(Date) > lubridate::days_in_month(Date) - days_before][
# compute period by shifting dates from next month into actual month
# coersion to IDate is required because Date is of class POSIXct
, Period := format(as.IDate(Date) - days_after, "%Y%m")][
# right join, dropping all rows from dailycds without matching period
setDT(periodicassets), on = "Period"][]
Date CDS Period BankName value 1: 2015-01-06 48.633000000000003 201412 BPCE 112189.50 2: 2015-01-05 46.670999999999999 201412 BPCE 112189.50 3: 2015-01-02 45.158000000000001 201412 BPCE 112189.50 4: 2015-01-01 47.32 201412 BPCE 112189.50 5: 2014-12-31 47.658000000000001 201412 BPCE 112189.50 6: 2014-12-30 45.843000000000004 201412 BPCE 112189.50 7: 2014-12-29 47.588999999999999 201412 BPCE 112189.50 8: 2015-02-06 47.265000000000001 201501 BPCE 103142.06 9: 2015-02-05 47.073999999999998 201501 BPCE 103142.06 10: 2015-02-04 46.634999999999998 201501 BPCE 103142.06 11: 2015-02-03 46.405000000000001 201501 BPCE 103142.06 12: 2015-02-02 47.567 201501 BPCE 103142.06 13: 2015-01-30 47.396000000000001 201501 BPCE 103142.06 14: 2015-01-29 48.448999999999998 201501 BPCE 103142.06 15: 2015-01-06 48.633000000000003 201412 Credit Agricole 81618.76 16: 2015-01-05 46.670999999999999 201412 Credit Agricole 81618.76 ... 26: 2015-02-02 47.567 201501 Credit Agricole 73987.36 27: 2015-01-30 47.396000000000001 201501 Credit Agricole 73987.36 28: 2015-01-29 48.448999999999998 201501 Credit Agricole 73987.36 Date CDS Period BankName value
The OP has clarified他正在使用buiness天而不是日历天。这种看似微小的规范变化对选择日期的方式产生了严重影响。
现在,始终挑选每个月的前6个条目以及该月最后一个交易日之前的最后3个条目(ultimo)和ultimo本身导致 3 + 1 + 6 = 10工作日来挑选。
# define range of business days relative to the last trading day (ultimo)
days_before <- 3L
days_after <- 6L
stopifnot(days_before + days_after < 28)
library(data.table)
# read data from dropbox links, note ?dl=1
dailycds <- readRDS(url("https://www.dropbox.com/s/r7v5dq6la0mnn71/dailycds.RDS?dl=1"))
periodicassets <- readRDS(url("https://www.dropbox.com/s/gdflcngwp8nm552/periodicassets.RDS?dl=1"))
# coerce to data.table
setDT(dailycds)[
# filter business dates:
# for each month pick the first days_after business days into the month
# and the last days_before biz days before and including ultimo
dailycds[, c(head(.I, days_after), tail(.I, days_before + 1L)),
by = .(year(Date), month(Date))]$V1][
# compute period by shifting dates from next month into actual month
# coersion to IDate is required because Date is of class POSIXct
, Period := format(as.IDate(Date) - days_after, "%Y%m")][
# right join, dropping all rows from dailycds without matching period
setDT(periodicassets), on = "Period"][]
Date CDS Period BankName value 1: 2015-01-06 48.633000000000003 201412 BPCE 112189.50 2: 2015-01-05 46.670999999999999 201412 BPCE 112189.50 3: 2015-01-02 45.158000000000001 201412 BPCE 112189.50 4: 2015-01-01 47.32 201412 BPCE 112189.50 5: 2014-12-31 47.658000000000001 201412 BPCE 112189.50 6: 2014-12-30 45.843000000000004 201412 BPCE 112189.50 7: 2014-12-29 47.588999999999999 201412 BPCE 112189.50 8: 2014-12-26 47.625999999999998 201412 BPCE 112189.50 9: 2014-12-25 47.697000000000003 201412 BPCE 112189.50 10: 2014-12-24 47.414999999999999 201412 BPCE 112189.50 11: 2015-02-05 47.073999999999998 201501 BPCE 103142.06 12: 2015-02-04 46.634999999999998 201501 BPCE 103142.06 13: 2015-02-03 46.405000000000001 201501 BPCE 103142.06 14: 2015-02-02 47.567 201501 BPCE 103142.06 15: 2015-01-30 47.396000000000001 201501 BPCE 103142.06 16: 2015-01-29 48.448999999999998 201501 BPCE 103142.06 17: 2015-01-28 49.442 201501 BPCE 103142.06 18: 2015-01-27 49.502000000000002 201501 BPCE 103142.06 19: 2015-01-26 49.73 201501 BPCE 103142.06 20: 2015-01-23 50.917000000000002 201501 BPCE 103142.06 21: 2015-01-06 48.633000000000003 201412 Credit Agricole 81618.76 22: 2015-01-05 46.670999999999999 201412 Credit Agricole 81618.76 ... 39: 2015-01-26 49.73 201501 Credit Agricole 73987.36 40: 2015-01-23 50.917000000000002 201501 Credit Agricole 73987.36 Date CDS Period BankName value
请注意,结果数据集包含(3 + 1 + 6)* 2个月* 2个银行= 40行。
如果保管箱链接断开:
dailycds <-
structure(list(Date = structure(c(1424649600, 1424390400, 1424304000,
1424217600, 1424131200, 1424044800, 1423785600, 1423699200, 1423612800,
1423526400, 1423440000, 1423180800, 1423094400, 1423008000, 1422921600,
1422835200, 1422576000, 1422489600, 1422403200, 1422316800, 1422230400,
1421971200, 1421884800, 1421798400, 1421712000, 1421625600, 1421366400,
1421280000, 1421193600, 1421107200, 1421020800, 1420761600, 1420675200,
1420588800, 1420502400, 1420416000, 1420156800, 1420070400, 1419984000,
1419897600, 1419811200, 1419552000, 1419465600, 1419379200, 1419292800,
1419206400, 1418947200, 1418860800, 1418774400, 1418688000, 1418601600,
1418342400, 1418256000, 1418169600, 1418083200, 1417996800, 1417737600,
1417651200, 1417564800, 1417478400, 1417392000, 1417132800, 1417046400,
1416960000, 1416873600, 1416787200, 1416528000, 1416441600, 1416355200,
1416268800, 1416182400, 1415923200, 1415836800, 1415750400, 1415664000,
1415577600, 1415318400, 1415232000, 1415145600, 1415059200, 1414972800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), CDS = c("44.259",
"44.555999999999997", "45.076999999999998", "44.951000000000001",
"45.762", "45.573", "45.634999999999998", "45.956000000000003",
"47.064", "47.51", "48.576999999999998", "47.265000000000001",
"47.073999999999998", "46.634999999999998", "46.405000000000001",
"47.567", "47.396000000000001", "48.448999999999998", "49.442",
"49.502000000000002", "49.73", "50.917000000000002", "51.37",
"52.536999999999999", "49.188000000000002", "47.893999999999998",
"46.728000000000002", "46.634999999999998", "46.366999999999997",
"47.012999999999998", "46.869", "48.121000000000002", "48.625999999999998",
"48.801000000000002", "48.633000000000003", "46.670999999999999",
"45.158000000000001", "47.32", "47.658000000000001", "45.843000000000004",
"47.588999999999999", "47.625999999999998", "47.697000000000003",
"47.414999999999999", "48.075000000000003", "48.085999999999999",
"47.496000000000002", "46.534999999999997", "48.149000000000001",
"49.421999999999997", "48.223999999999997", "47.100999999999999",
"47.484999999999999", "47.491999999999997", "47.052", "46.697000000000003",
"44.670999999999999", "47.706000000000003", "46.835000000000001",
"48.66", "46.841999999999999", "48.069000000000003", "49.49",
"50.155000000000001", "50.155000000000001", "50.49", "52.024000000000001",
"50.33", "50", "50.67", "53.15", "52.994999999999997", "55.31",
"50.82", "50.49", "50.832999999999998", "52.241", "51.97", "52.8",
"50.667000000000002", "51.134999999999998")), .Names = c("Date",
"CDS"), row.names = c(NA, -81L), class = c("tbl_df", "tbl", "data.frame"))
periodicassets <-
structure(list(BankName = c(" BPCE", " BPCE", " Credit Agricole",
" Credit Agricole"), Period = c("201412", "201501", "201412",
"201501"), value = c(112189.50293406, 103142.064337463, 81618.762099507,
73987.36251389)), .Names = c("BankName", "Period", "value"), row.names = c(10L,
11L, 18L, 19L), class = "data.frame")
答案 1 :(得分:-1)
See if this works for you
def get_patient_position(dcm, origins, pixel_spacing, orientation):
"""
Image Space --> Anatomical (Patient) Space is an affine transformation
using the Image Orientation (Patient), Image Position (Patient), and
Pixel Spacing properties from the DICOM header
"""
print "getting patient coordinates"
world_coordinates = np.empty((dcm.shape[0], dcm.shape[1],dcm.shape[2], 3))
affine_matrix = np.zeros((4,4), dtype=np.float32)
rows = dcm.shape[0]
cols = dcm.shape[1]
num_slices = dcm.shape[2]
image_orientation_x = np.array([ orientation[0], orientation[1], orientation[2] ]).reshape(3,1)
image_orientation_y = np.array([ orientation[3], orientation[4], orientation[5] ]).reshape(3,1)
pixel_spacing_x = pixel_spacing[0]
# Construct affine matrix
# Method from:
# http://nipy.org/nibabel/dicom/dicom_orientation.html
T_1 = origins[0]
T_n = origins[num_slices-1]
affine_matrix[0,0] = image_orientation_y[0] * pixel_spacing[0]
affine_matrix[0,1] = image_orientation_x[0] * pixel_spacing[1]
affine_matrix[0,3] = T_1[0]
affine_matrix[1,0] = image_orientation_y[1] * pixel_spacing[0]
affine_matrix[1,1] = image_orientation_x[1] * pixel_spacing[1]
affine_matrix[1,3] = T_1[1]
affine_matrix[2,0] = image_orientation_y[2] * pixel_spacing[0]
affine_matrix[2,1] = image_orientation_x[2] * pixel_spacing[1]
affine_matrix[2,3] = T_1[2]
affine_matrix[3,3] = 1
k1 = (T_1[0] - T_n[0])/ (1 - num_slices)
k2 = (T_1[1] - T_n[1])/ (1 - num_slices)
k3 = (T_1[2] - T_n[2])/ (1 - num_slices)
affine_matrix[:3, 2] = np.array([k1,k2,k3])
for z in range(num_slices):
for r in range(rows):
for c in range(cols):
vector = np.array([r, c, 0, 1]).reshape((4,1))
result = np.matmul(affine_matrix, vector)
result = np.delete(result, 3, axis=0)
result = np.transpose(result)
world_coordinates[r,c,z] = result
# print "Finished slice ", str(z)
# np.save('./data/saved/world_coordinates_3d.npy', str(world_coordinates))
return world_coordinates
Joining, by = "Date"
def create_lh_histogram(patient_positions, dcm, magnitude, azimuthal, elevation): print "constructing LH histogram" # Get 2nd derivative second_derivative = gaussian_filter(magnitude, sigma=1, order=1) # Determine if voxels lie on boundary or not (thresholding) # Still have to code out: let's say the thresholded voxels are in # a numpy array called voxels #Iterate through all thresholded voxels and integrate gradient field in # both directions using 2nd-order Runge-Kutta vox_it = voxels.nditer(voxels, flags=['multi_index']) while not vox_it.finished: # ???