我正在尝试最近学习python
。
我尝试将R
代码转换为Python
,以便检查结果是否正确。
这是R中的数据集:
> dt
date Factors val_sum
1: 2015-09-01 I 66084
2: 2017-11-01 I 17096
3: 2015-10-01 G 7988
4: 2013-03-01 I 3726
5: 2013-12-01 I 139
6: 2013-03-01 I 45
7: 2018-02-01 I 19674
8: 2015-12-01 I 45654
9: 2014-12-01 I 207240
10: 2015-07-01 G 7642
11: 2013-03-01 I 29054
12: 2015-12-01 I 24030
13: 2017-06-01 I 234142
14: 2018-11-01 I 2633
15: 2018-11-01 I 152254
> str(dt)
Classes ‘data.table’ and 'data.frame': 15 obs. of 3 variables:
$ date : Date, format: "2015-09-01" "2017-11-01" "2015-10-01" "2013-03-01" ...
$ Factors: chr "I" "I" "G" "I" ...
$ val_sum: int 66084 17096 7988 3726 139 45 19674 45654 207240 7642 ...
- attr(*, ".internal.selfref")=<externalptr>
现在,我使用的R代码如下:
library(data.table)
dt <- dt[order(Factors, date)]
dt[, yq := as.yearqtr(dt$date, format = "%Y-%m-%d")]
dt <- dt[, lapply(.SD, as.numeric), .SDcols = 3, by = .(yq, Factors)][, lapply(.SD, function(x) sum = sum(x, na.rm = T)), .SDcols = 3, by = .(yq, Factors)][]
dt <- dt[, paste0(names(dt)[3], ".R") := lapply(.SD, function(x) R = x / x[1]), .SDcols = 3, by = .(Factors)][order(Factors, yq)][]
dt[, paste0(names(dt)[3], ".YQGR") := lapply(.SD, function(x) x / shift(x) - 1), .SDcols = 3, by = .(quarter(yq), Factors)]
我首先如上所述操作数据,然后使用ggplot2
绘制和自定义图。
ggplot(data = dt, aes(x = yq, colour = Factors, fill = Factors,
label = scales::percent(val_sum.YQGR, accuracy = 0.1))) +
geom_col(aes(y = 0.1 * val_sum.YQGR), position = position_dodge2(width = 0)) +
geom_line(aes(y = val_sum.R)) +
scale_colour_manual(values = c("red", "darkblue"), labels = c("G", "I")) +
scale_fill_manual(values = c("red", "darkblue"), labels = NULL, breaks = NULL) +
scale_x_yearqtr(format = "%YQ%q", breaks = unique(dt$yq)) +
scale_y_continuous(name = "R",
sec.axis = sec_axis(~./0.1, name = "YQGR",
labels = scales::percent)) +
theme_bw() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.title.y.right = element_blank(),
axis.ticks.x=element_blank(),
axis.ticks.y=element_blank(),
axis.text.x=element_text(angle = 45, size = 12, vjust = 0.5, face = "bold"),
axis.text.y=element_blank(),
axis.line = element_line(colour = "white"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
plot.background=element_blank(),
legend.position="left",
legend.title=element_blank(),
legend.text = element_text(size = 16, face = "bold"),
legend.key = element_blank(),
legend.box.background = element_blank()) +
guides(colour = guide_legend(override.aes = list(shape = 15, size = 10))) +
geom_text(data = dt, aes(y = 0.1 * val_sum.YQGR, colour = Factors),
position = position_dodge(width = 0.25),
vjust = -0.3, size = 4)
如何有效地将此R
代码转换为Python 3
。
数据:
dt <- data.table::fread(" date Factors val_sum
2015-09-01 I 66084
2017-11-01 I 17096
2015-10-01 G 7988
2013-03-01 I 3726
2013-12-01 I 139
2013-03-01 I 45
2018-02-01 I 19674
2015-12-01 I 45654
2014-12-01 I 207240
2015-07-01 G 7642
2013-03-01 I 29054
2015-12-01 I 24030
2017-06-01 I 234142
2018-11-01 I 2633
2018-11-01 I 152254", header = T)
谢谢。
结果:
yq Factors val_sum val_sum.R val_sum.YQGR
1: 2015 Q3 G 7642 1.000000000 NA
2: 2015 Q4 G 7988 1.045276106 NA
3: 2013 Q1 I 32825 1.000000000 NA
4: 2013 Q4 I 139 0.004234577 NA
5: 2014 Q4 I 207240 6.313480579 1489.9352518
6: 2015 Q3 I 66084 2.013221630 NA
7: 2015 Q4 I 69684 2.122894136 -0.6637522
8: 2017 Q2 I 234142 7.133038842 NA
9: 2017 Q4 I 17096 0.520822544 -0.7546639
10: 2018 Q1 I 19674 0.599360244 -0.4006398
11: 2018 Q4 I 154887 4.718568165 8.0598386
由于我只是简单地创建了数据,由于极高的值,该图看起来很奇怪,而由于缺少年份和季度,x轴也很奇怪。
但是逻辑是一样的。
答案 0 :(得分:1)
这是您的R
代码的python翻译:
dfstr = ''' date Factors val_sum
2015-09-01 I 66084
2017-11-01 I 17096
2015-10-01 G 7988
2013-03-01 I 3726
2013-12-01 I 139
2013-03-01 I 45
2018-02-01 I 19674
2015-12-01 I 45654
2014-12-01 I 207240
2015-07-01 G 7642
2013-03-01 I 29054
2015-12-01 I 24030
2017-06-01 I 234142
2018-11-01 I 2633
2018-11-01 I 152254'''
df = pd.read_csv(pd.compat.StringIO(dfstr), sep='\s+')
df.date = pd.to_datetime(df.date)
df['year']= df.date.dt.year
df['qtr'] = df.date.dt.month//4 + 1
df = df.groupby(['Factors', 'year', 'qtr']).val_sum.sum().reset_index()
df['val_sum.R'] = df.val_sum / df.groupby('Factors').val_sum.transform('first')
df['val_sum.YQGR'] = df.val_sum / df.groupby(['qtr', 'Factors']).val_sum.transform('shift') - 1
输出:
+----+-----------+--------+-------+-----------+-------------+----------------+
| | Factors | year | qtr | val_sum | val_sum.R | val_sum.YQGR |
|----+-----------+--------+-------+-----------+-------------+----------------|
| 0 | G | 2015 | 2 | 7642 | 1 | nan |
| 1 | G | 2015 | 3 | 7988 | 1.04528 | nan |
| 2 | I | 2013 | 1 | 32825 | 1 | nan |
| 3 | I | 2013 | 4 | 139 | 0.00423458 | nan |
| 4 | I | 2014 | 4 | 207240 | 6.31348 | 1489.94 |
| 5 | I | 2015 | 3 | 66084 | 2.01322 | nan |
| 6 | I | 2015 | 4 | 69684 | 2.12289 | -0.663752 |
| 7 | I | 2017 | 2 | 234142 | 7.13304 | nan |
| 8 | I | 2017 | 3 | 17096 | 0.520823 | -0.741299 |
| 9 | I | 2018 | 1 | 19674 | 0.59936 | -0.40064 |
| 10 | I | 2018 | 3 | 154887 | 4.71857 | 8.05984 |
+----+-----------+--------+-------+-----------+-------------+----------------+