识别和计数数据框R中的字符串

时间:2018-09-29 16:18:27

标签: r

我对R很陌生,我正在尝试写一些行来帮助我识别和计算从Excel导入到R的一些字符串。数据如下:

id           Solutions                           PFBA(R_Biomass_LPAREN_e_RPAREN_)
1   R_PEPCK R_TRANSH2 R_PGI R_GLUCK                         1.1750060160861004
2   R_PEPCK R_TRANSH2 R_PGI R_G1D                           1.1750060160861004
3   R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TALA2 R_G6P1D 3.2099449405406175
4   R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_R5PI R_G6P1D  3.2099449405406175
5   R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TKT1 R_G6P1D  3.2099449405406175
6   R_6PGDH R_PYK R_PGM R_PGI R_MAL1 R_MAL2 R_TALA2 R_G6P1D 2.0012655526190235
7   R_6PGDH R_PYK R_PGM R_PGI R_MAL1 R_MAL2 R_R5PI R_G6P1D  2.0012655526190235
8   R_6PGDH R_PYK R_PGM R_PGI R_MAL1 R_MAL2 R_TKT1 R_G6P1D  2.0012655526190235

我的目标是浏览“解决方案”列,并确定反应(出现新反应时)并计数。最后,理想的输出如下:

R_PEPCK: 15
R_TRANHS2: 5
R_PGI: 2
(Etc...)

它返回给我一个组织好的列表,其中包含所有反应以及它们在解决方案列中出现的次数。

谢谢!

2 个答案:

答案 0 :(得分:1)

这里是一种方法:

样本数据

df <- data.frame(Solutions = c('R_PEPCK R_TRANSH2 R_PGI R_GLUCK', 'R_PEPCK R_TRANSH2 R_PGI R_G1D', 'R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TALA2 R_G6P1D'), stringsAsFactors = F)

                                                Solutions
1                         R_PEPCK R_TRANSH2 R_PGI R_GLUCK
2                           R_PEPCK R_TRANSH2 R_PGI R_G1D
3 R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TALA2 R_G6P1D

唯一字符串的计数(以空格分隔):

counts <- as.matrix(table(unlist(strsplit(df$Solutions, split = '\\W'))), ncol = 1)

          [,1]
R_6PGDH      1
R_G1D        1
R_G6P1D      1
R_GLUCK      1
R_MAL1       1
R_MAL2       1
R_PEPCK      2
R_PFK        1
R_PGI        2
R_PGM        1
R_PYK        1
R_TALA2      1
R_TRANSH2    2

答案 1 :(得分:1)

如果您喜欢tidyverse解决方案:

df %>%
  separate_rows(Solutions, sep = " ") %>%
  count(Solutions)

# A tibble: 13 x 2
   Solutions     n
   <chr>     <int>
 1 R_6PGDH       1
 2 R_G1D         1
 3 R_G6P1D       1
 4 R_GLUCK       1
 5 R_MAL1        1
 6 R_MAL2        1
 7 R_PEPCK       2
 8 R_PFK         1
 9 R_PGI         2
10 R_PGM         1
11 R_PYK         1
12 R_TALA2       1
13 R_TRANSH2     2