我对R很陌生,我正在尝试写一些行来帮助我识别和计算从Excel导入到R的一些字符串。数据如下:
id Solutions PFBA(R_Biomass_LPAREN_e_RPAREN_)
1 R_PEPCK R_TRANSH2 R_PGI R_GLUCK 1.1750060160861004
2 R_PEPCK R_TRANSH2 R_PGI R_G1D 1.1750060160861004
3 R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TALA2 R_G6P1D 3.2099449405406175
4 R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_R5PI R_G6P1D 3.2099449405406175
5 R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TKT1 R_G6P1D 3.2099449405406175
6 R_6PGDH R_PYK R_PGM R_PGI R_MAL1 R_MAL2 R_TALA2 R_G6P1D 2.0012655526190235
7 R_6PGDH R_PYK R_PGM R_PGI R_MAL1 R_MAL2 R_R5PI R_G6P1D 2.0012655526190235
8 R_6PGDH R_PYK R_PGM R_PGI R_MAL1 R_MAL2 R_TKT1 R_G6P1D 2.0012655526190235
我的目标是浏览“解决方案”列,并确定反应(出现新反应时)并计数。最后,理想的输出如下:
R_PEPCK: 15
R_TRANHS2: 5
R_PGI: 2
(Etc...)
它返回给我一个组织好的列表,其中包含所有反应以及它们在解决方案列中出现的次数。
谢谢!
答案 0 :(得分:1)
这里是一种方法:
样本数据
df <- data.frame(Solutions = c('R_PEPCK R_TRANSH2 R_PGI R_GLUCK', 'R_PEPCK R_TRANSH2 R_PGI R_G1D', 'R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TALA2 R_G6P1D'), stringsAsFactors = F)
Solutions
1 R_PEPCK R_TRANSH2 R_PGI R_GLUCK
2 R_PEPCK R_TRANSH2 R_PGI R_G1D
3 R_PFK R_6PGDH R_PYK R_PGM R_MAL1 R_MAL2 R_TALA2 R_G6P1D
唯一字符串的计数(以空格分隔):
counts <- as.matrix(table(unlist(strsplit(df$Solutions, split = '\\W'))), ncol = 1)
[,1]
R_6PGDH 1
R_G1D 1
R_G6P1D 1
R_GLUCK 1
R_MAL1 1
R_MAL2 1
R_PEPCK 2
R_PFK 1
R_PGI 2
R_PGM 1
R_PYK 1
R_TALA2 1
R_TRANSH2 2
答案 1 :(得分:1)
如果您喜欢tidyverse
解决方案:
df %>%
separate_rows(Solutions, sep = " ") %>%
count(Solutions)
# A tibble: 13 x 2
Solutions n
<chr> <int>
1 R_6PGDH 1
2 R_G1D 1
3 R_G6P1D 1
4 R_GLUCK 1
5 R_MAL1 1
6 R_MAL2 1
7 R_PEPCK 2
8 R_PFK 1
9 R_PGI 2
10 R_PGM 1
11 R_PYK 1
12 R_TALA2 1
13 R_TRANSH2 2