通过使用正则表达式导出值来使用R绘制线图

时间:2015-11-18 11:10:08

标签: r

我仍然是R编程世界的初学者,请不要介意基本问题。 我有一个文件中的数据,如下所示。

grep "lcost" inflection_point.trc
AP: lcost=4.00, rcost=6.02
AP: lcost=74340.93, rcost=249.97
AP: lcost=37172.17, rcost=128.50
AP: lcost=18587.79, rcost=6.24
AP: lcost=9295.60, rcost=6.13
AP: lcost=4649.71, rcost=6.08
AP: lcost=2326.56, rcost=6.05
AP: lcost=1165.19, rcost=6.04
AP: lcost=584.30, rcost=6.03
AP: lcost=294.06, rcost=6.03
AP: lcost=148.94, rcost=6.02
.....

grep "inflection point at card" inflection_point.trc
AP: Costing Nested Loops Join for inflection point at card 1.35
AP: Costing Hash Join for inflection point at card 1.35
AP: Costing Nested Loops Join for inflection point at card 182361.04
AP: Costing Hash Join for inflection point at card 182361.04
AP: Costing Nested Loops Join for inflection point at card 91181.20
AP: Costing Hash Join for inflection point at card 91181.20
AP: Costing Nested Loops Join for inflection point at card 45591.27
AP: Costing Hash Join for inflection point at card 45591.27
AP: Costing Nested Loops Join for inflection point at card 22796.31
AP: Costing Hash Join for inflection point at card 22796.31
AP: Costing Nested Loops Join for inflection point at card 11398.83
AP: Costing Hash Join for inflection point at card 11398.83
.....

要求是使用R编程表示lcost和rcost值的绘图线图,x轴值来自"拐点"。

我尝试使用grep创建数据框但是徒劳无功,也不知道如何将这些值加载到数据框中并绘制lcost和rcost的线图以及x轴值。

> dataframe <- grep ('lcost',readLines("inflection_point.trc"),value=TRUE)
 [1] "AP: lcost=4.00, rcost=6.02"       "AP: lcost=74340.93, rcost=249.97"
 [3] "AP: lcost=37172.17, rcost=128.50" "AP: lcost=18587.79, rcost=6.24"  
 [5] "AP: lcost=9295.60, rcost=6.13"    "AP: lcost=4649.71, rcost=6.08"   
 [7] "AP: lcost=2326.56, rcost=6.05"    "AP: lcost=1165.19, rcost=6.04"   
 [9] "AP: lcost=584.30, rcost=6.03"     "AP: lcost=294.06, rcost=6.03"    
[11] "AP: lcost=148.94, rcost=6.02"     "AP: lcost=75.97, rcost=6.02"     
[13] "AP: lcost=39.69, rcost=6.02"      "AP: lcost=21.75, rcost=6.02"     
[15] "AP: lcost=12.78, rcost=6.02"      "AP: lcost=7.89, rcost=6.02"      
[17] "AP: lcost=5.85, rcost=6.02"       "AP: lcost=7.08, rcost=6.02"      
[19] "AP: lcost=6.26, rcost=6.02"       "AP: lcost=6.26, rcost=6.02" 

任何帮助对我来说都是很好的学习R

这是我能想到的,有人可以通过使用ggplot帮助我绘制线图。与我的派生方式相比,有没有简单的方法来计算数据?有没有办法将Dataframe中的所有列数据类型转换为Double?

lines <- readLines("inflection_point.trc")
require(reshape2)
fd1 <- colsplit(string=gsub( "[A-z]+[[:punct:]]", "", grep("cost=[0-9]+", lines, value=TRUE)),pattern=",", names=c("HASH", "NESTED"))
fd1
       HASH NESTED
1      4.00   6.02
2  74340.93 249.97
3  37172.17 128.50
4  18587.79   6.24
5   9295.60   6.13
6   4649.71   6.08
7   2326.56   6.05
8   1165.19   6.04
9    584.30   6.03
10   294.06   6.03
11   148.94   6.02
12    75.97   6.02
13    39.69   6.02
14    21.75   6.02
15    12.78   6.02
16     7.89   6.02
17     5.85   6.02
18     7.08   6.02
19     6.26   6.02
20     6.26   6.02
fd2 <- data.frame(Card= unique(gsub( "([[:alpha:]]|\\s|:)", "", grep(".*inflection point at card", lines, value=TRUE))))
fd2
        Card
1       1.35
2  182361.04
3   91181.20
4   45591.27
5   22796.31
6   11398.83
7    5700.09
8    2850.72
9    1426.04
10    713.69
11    357.52
12    179.44
13     90.39
14     45.87
15     23.61
16     12.48
17      6.92
18      9.70
19      8.31
20      7.61

require(dplyr)
fd3 <- bind_cols(fd1,fd2)
fd3
Source: local data frame [20 x 3]

       HASH NESTED      Card
      (dbl)  (dbl)    (fctr)
1      4.00   6.02      1.35
2  74340.93 249.97 182361.04
3  37172.17 128.50  91181.20
4  18587.79   6.24  45591.27
5   9295.60   6.13  22796.31
6   4649.71   6.08  11398.83
7   2326.56   6.05   5700.09
8   1165.19   6.04   2850.72
9    584.30   6.03   1426.04
10   294.06   6.03    713.69
11   148.94   6.02    357.52
12    75.97   6.02    179.44
13    39.69   6.02     90.39
14    21.75   6.02     45.87
15    12.78   6.02     23.61
16     7.89   6.02     12.48
17     5.85   6.02      6.92
18     7.08   6.02      9.70
19     6.26   6.02      8.31
20     6.26   6.02      7.61
fd3 <- fd3[-1,]
fd3
Source: local data frame [19 x 3]

       HASH NESTED      Card
      (dbl)  (dbl)    (fctr)
1  74340.93 249.97 182361.04
2  37172.17 128.50  91181.20
3  18587.79   6.24  45591.27
4   9295.60   6.13  22796.31
5   4649.71   6.08  11398.83
6   2326.56   6.05   5700.09
7   1165.19   6.04   2850.72
8    584.30   6.03   1426.04
9    294.06   6.03    713.69
10   148.94   6.02    357.52
11    75.97   6.02    179.44
12    39.69   6.02     90.39
13    21.75   6.02     45.87
14    12.78   6.02     23.61
15     7.89   6.02     12.48
16     5.85   6.02      6.92
17     7.08   6.02      9.70
18     6.26   6.02      8.31
19     6.26   6.02      7.61

> is.data.frame(fd3)
[1] TRUE

0 个答案:

没有答案