我想有条件地为列的每个单元格中的每个字符着色。 我可以想象要做什么偏(R中的新手):
1. open xlsx table or txt and change it to xlsx
2. iterate through column (threat cell as vector)
3. iterate through each vector (characters) and change color conditionally
(and through regex find lines which will be colored - sequences)
4. save to xlsx
但是我不知道如何为xlsx(以及哪个lib)中的项目着色,以及如何通过此更改保存文件。
样本数据
>>f_2;hypothetical protein L_2128 [Legionella] {gene:L_2128}_start=1;end=300;length=300;source_length=320
LAKELTYTDIINLKDSGLISNSEALCSIDFSERNSCTLINCKKLIIIEASQESSKIQLSILPFTKAGTELLAFTNPTSNNEYIMKLCNLVKASKARIHVADIEKIVGDKISYKNKNVISG
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 | -0.368E+01 >>_vfdb.0002001_ VFG001328(gi:21283614) (sak) Staphylokinase precursor [Staphylokinase (VF0021)] [Staphylococcus aureus subsp. aureus MW2] :_: Length: 163
ss1 HHHHHHHHHEEEETTCCCCCCCHHHEEEHTTTTTTTH-HHHHHHEEEEHHHHHHTHEEEEEECTCCCCHHHEEECCCCCCTTEHHHHHHHHHHH
#1 LAKELTYTDIINLKDSGLISNSEALCSIDFSERNSCT-LINCKKLIIIEASQESSKIQLSILPFTKAGTELLAFTNPTSNNEYIMKLCNLVKAS
#c ----------------------+---------------+----+-----------+------+-+--+-----------+--------------
#2 VEFPIKPGTTLTKEK--IEYYVEWALDATAYKEFRVVELDTSAKIEVTYYDKNKKKEETKSFPITEKGFVVPDLSEHIKNPGFNLITKVVIEKK
ss2 EEETTCCTCCCHHHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHEEHHHHHHHHHHHHHHCHHHTTTEECHHHHHTTCCTTTCEEEHHHHHHH
pseudoscore: 8.51
1st sequence starts at 1
2nd sequence starts at 72
>>f_1; hypothetical protein L_2128 [Legionella] {gene:L_2128}_start=201;end=320;length=120;source_length=320
LAKELTYTDIINLKDSGLISNSEALCSIDFSERNSCTLINCKKLIIIEASQESSKIQLSILPFTKAGTELLAFTNPTSNNEYIMKLCNLVKASKARIHVADIEKIVGDKISYKNKNVISG
&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 | -0.368E+01 >>_vfdb.0002001_ VFG001328(gi:21283614) (sak) Staphylokinase precursor [Staphylokinase (VF0021)] [Staphylococcus aureus subsp. aureus MW2] :_: Length: 163
ss1 HHHHHHHHHEEEETTCCCCCCCHHHEEEHTTTTTTTH-HHHHHHEEEEHHHHHHTHEEEEEECTCCCCHHHEEECCCCCCTTEHHHHHHHHHHH
#1 LAKELTYTDIINLKDSGLISNSEALCSIDFSERNSCT-LINCKKLIIIEASQESSKIQLSILPFTKAGTELLAFTNPTSNNEYIMKLCNLVKAS
#c ----------------------+---------------+----+-----------+------+-+--+-----------+--------------
#2 VEFPIKPGTTLTKEK--IEYYVEWALDATAYKEFRVVELDTSAKIEVTYYDKNKKKEETKSFPITEKGFVVPDLSEHIKNPGFNLITKVVIEKK
ss2 EEETTCCTCCCHHHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHEEHHHHHHHHHHHHHHCHHHTTTEECHHHHHTTCCTTTCEEEHHHHHHH
pseudoscore: 8.51
1st sequence starts at 1
2nd sequence starts at 72
我的代码:
# xlsx files
setwd('D:/Dropbox/color_ffas_results')
library(xlsx)
wb <- loadWorkbook("sample.xlsx")
sheet1 <- getSheets(wb)[[1]]
# get all rows
rows <- getRows(sheet1)
cells <- getCells(rows)
# look at the values
sapply(cells, getCellValue)
cellColor <- function(style) {
SET COLOR HERE
}
#sequence_pattern <- str_detect("^#\d .*\n")