在可视化中将R代码转换为Python

时间:2019-05-29 13:19:51

标签: python r python-3.x matplotlib data-manipulation

我正在尝试最近学习python
我尝试将R代码转换为Python,以便检查结果是否正确。

这是R中的数据集:

> dt
          date  Factors val_sum
 1: 2015-09-01        I   66084
 2: 2017-11-01        I   17096
 3: 2015-10-01        G    7988
 4: 2013-03-01        I    3726
 5: 2013-12-01        I     139
 6: 2013-03-01        I      45
 7: 2018-02-01        I   19674
 8: 2015-12-01        I   45654
 9: 2014-12-01        I  207240
10: 2015-07-01        G    7642
11: 2013-03-01        I   29054
12: 2015-12-01        I   24030
13: 2017-06-01        I  234142
14: 2018-11-01        I    2633
15: 2018-11-01        I  152254

> str(dt)
Classes ‘data.table’ and 'data.frame':  15 obs. of  3 variables:
$ date   : Date, format: "2015-09-01" "2017-11-01" "2015-10-01" "2013-03-01" ...
$ Factors: chr  "I" "I" "G" "I" ...
$ val_sum: int  66084 17096 7988 3726 139 45 19674 45654 207240 7642 ...
- attr(*, ".internal.selfref")=<externalptr> 

现在,我使用的R代码如下:

library(data.table)
dt <- dt[order(Factors, date)]
dt[, yq := as.yearqtr(dt$date, format = "%Y-%m-%d")]
dt <- dt[, lapply(.SD, as.numeric), .SDcols = 3, by = .(yq, Factors)][, lapply(.SD, function(x) sum = sum(x, na.rm = T)), .SDcols = 3, by = .(yq, Factors)][]
dt <- dt[, paste0(names(dt)[3], ".R") := lapply(.SD, function(x) R = x / x[1]), .SDcols = 3, by = .(Factors)][order(Factors, yq)][]
dt[, paste0(names(dt)[3], ".YQGR") := lapply(.SD, function(x) x / shift(x) - 1), .SDcols = 3, by = .(quarter(yq), Factors)]

我首先如上所述操作数据,然后使用ggplot2绘制和自定义图。

ggplot(data = dt, aes(x = yq, colour = Factors, fill = Factors,
                       label = scales::percent(val_sum.YQGR, accuracy = 0.1))) + 
geom_col(aes(y = 0.1 * val_sum.YQGR), position = position_dodge2(width = 0)) +
geom_line(aes(y = val_sum.R)) +
scale_colour_manual(values = c("red", "darkblue"), labels = c("G", "I")) +
scale_fill_manual(values = c("red", "darkblue"), labels = NULL, breaks = NULL) +
scale_x_yearqtr(format = "%YQ%q", breaks = unique(dt$yq)) +
scale_y_continuous(name = "R",
                   sec.axis = sec_axis(~./0.1, name = "YQGR",
                                       labels = scales::percent)) +
theme_bw() +
theme(axis.title.x = element_blank(),
      axis.title.y = element_blank(),
      axis.title.y.right = element_blank(),
      axis.ticks.x=element_blank(),
      axis.ticks.y=element_blank(),
      axis.text.x=element_text(angle = 45, size = 12, vjust = 0.5, face = "bold"),
      axis.text.y=element_blank(),
      axis.line = element_line(colour = "white"),
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      panel.border = element_blank(),
      panel.background = element_blank(),
      plot.background=element_blank(),
      legend.position="left",
      legend.title=element_blank(),
      legend.text = element_text(size = 16, face = "bold"),
      legend.key = element_blank(),
      legend.box.background =  element_blank()) +
guides(colour = guide_legend(override.aes = list(shape = 15, size = 10))) +
geom_text(data = dt, aes(y = 0.1 * val_sum.YQGR, colour = Factors), 
          position = position_dodge(width = 0.25),
          vjust = -0.3, size = 4)

如何有效地将此R代码转换为Python 3

数据:

dt <- data.table::fread("          date  Factors val_sum
2015-09-01        I   66084
2017-11-01        I   17096
2015-10-01        G    7988
2013-03-01        I    3726
2013-12-01        I     139
2013-03-01        I      45
2018-02-01        I   19674
2015-12-01        I   45654
2014-12-01        I  207240
2015-07-01        G    7642
2013-03-01        I   29054
2015-12-01        I   24030
2017-06-01        I  234142
2018-11-01        I    2633
2018-11-01        I  152254", header = T)

谢谢。

结果:

         yq Factors val_sum   val_sum.R val_sum.YQGR
 1: 2015 Q3       G    7642 1.000000000           NA
 2: 2015 Q4       G    7988 1.045276106           NA
 3: 2013 Q1       I   32825 1.000000000           NA
 4: 2013 Q4       I     139 0.004234577           NA
 5: 2014 Q4       I  207240 6.313480579 1489.9352518
 6: 2015 Q3       I   66084 2.013221630           NA
 7: 2015 Q4       I   69684 2.122894136   -0.6637522
 8: 2017 Q2       I  234142 7.133038842           NA
 9: 2017 Q4       I   17096 0.520822544   -0.7546639
10: 2018 Q1       I   19674 0.599360244   -0.4006398
11: 2018 Q4       I  154887 4.718568165    8.0598386

enter image description here

由于我只是简单地创建了数据,由于极高的值,该图看起来很奇怪,而由于缺少年份和季度,x轴也很奇怪。
但是逻辑是一样的。

1 个答案:

答案 0 :(得分:1)

这是您的R代码的python翻译:

dfstr = '''         date  Factors val_sum
2015-09-01        I   66084
2017-11-01        I   17096
2015-10-01        G    7988
2013-03-01        I    3726
2013-12-01        I     139
2013-03-01        I      45
2018-02-01        I   19674
2015-12-01        I   45654
2014-12-01        I  207240
2015-07-01        G    7642
2013-03-01        I   29054
2015-12-01        I   24030
2017-06-01        I  234142
2018-11-01        I    2633
2018-11-01        I  152254'''

df = pd.read_csv(pd.compat.StringIO(dfstr), sep='\s+')

df.date = pd.to_datetime(df.date)

df['year']= df.date.dt.year 
df['qtr'] = df.date.dt.month//4 + 1

df = df.groupby(['Factors', 'year', 'qtr']).val_sum.sum().reset_index()

df['val_sum.R'] = df.val_sum / df.groupby('Factors').val_sum.transform('first')
df['val_sum.YQGR'] = df.val_sum / df.groupby(['qtr', 'Factors']).val_sum.transform('shift') - 1

输出:

+----+-----------+--------+-------+-----------+-------------+----------------+
|    | Factors   |   year |   qtr |   val_sum |   val_sum.R |   val_sum.YQGR |
|----+-----------+--------+-------+-----------+-------------+----------------|
|  0 | G         |   2015 |     2 |      7642 |  1          |     nan        |
|  1 | G         |   2015 |     3 |      7988 |  1.04528    |     nan        |
|  2 | I         |   2013 |     1 |     32825 |  1          |     nan        |
|  3 | I         |   2013 |     4 |       139 |  0.00423458 |     nan        |
|  4 | I         |   2014 |     4 |    207240 |  6.31348    |    1489.94     |
|  5 | I         |   2015 |     3 |     66084 |  2.01322    |     nan        |
|  6 | I         |   2015 |     4 |     69684 |  2.12289    |      -0.663752 |
|  7 | I         |   2017 |     2 |    234142 |  7.13304    |     nan        |
|  8 | I         |   2017 |     3 |     17096 |  0.520823   |      -0.741299 |
|  9 | I         |   2018 |     1 |     19674 |  0.59936    |      -0.40064  |
| 10 | I         |   2018 |     3 |    154887 |  4.71857    |       8.05984  |
+----+-----------+--------+-------+-----------+-------------+----------------+