Question

我有一个xml字符串（转换为列表），并且正在寻找特定的字符串。我只想在此字符串在列表的下一行中具有相同的特定字符串的情况下进行操作。

xml（称为diff）：

NNtset1 = read.csv(file.choose())
LMM_alldata = lm(CS~Time+Class+(1|Groups),data=NNtset1,REML=FALSE)
summary(LMM_alldata)

代码：

 <result type="MLST" value="96">
      <result_data type="profile" value="43,47,49,49,41,15,3"/>
      <result_data type="QC_minimum_consensus_depth" value="7"/>
      <result_data type="QC_max_percentage_non_consensus_base" value="10.0"/>
      <result_data type="QC_percentage_coverage" value="100"/>
      <result_data type="QC_minimum_consensus_depth_for_all_loci" value="7,17,27,10,25,18,22" diff:update-attr="value:7,17,27,10,24,18,22"/>
      <result_data type="QC_complete_pileup" value="TRUE"/>
      <result_data type="QC_mean_consensus_depth" value="17.67"/>
      <result_data type="QC_max_percentage_non_consensus_base_for_all_loci" value="10.0, 6.25, 3.45, 9.09, 5.88, 5.26, 5.41"/>
      <result_data type="QC_mean_consensus_depth_for_all_loci" value="17.67, 32.49, 34.09, 23.44, 35.57, 29.02, 39.08" diff:update-attr="value:17.67, 32.49, 34.09, 23.44, 34.24, 29.02, 39.08"/>
      <result_data type="QC_traffic_light" value="GREEN"/>
      <result_data diff:insert="" type="predicted_serotype" diff:add-attr="type;value" value="('Schwarzengrund (Achtman)', 168), ('Schwarzengrund (PHE)', 83), ('Blockley (Achtman)', 1), ('Uppsala (Achtman)', 1), ('Oslo (Achtman)', 1), ('Schwarzengru (Achtman)', 1), ('Iv Rough:Z4,Z32:- (Achtman)', 1)"/>
      <result_data type="predicted_serotype" value="('Schwarzengrund (PHE)', 13)" diff:delete=""/>
</result>
<gastro_prelim_st reason="not novel" success="false">
      <type st="96"/>
</gastro_prelim_st>

我想要的是，如果您在行中罚款“ predicted_serotype”，而下一行也有“ predicted_serotype”，然后打印。

感谢任何帮助。

Answer 1

我所做的只是将您的xml内容复制到txt文件中，然后将其读取为字符串

file = "path/tmp.txt"
# the content will be a variable containing string
with open(file, 'r') as file:
    content = file.read()

# diff_list is a list
diff_list = content.split("\n")    
for n,line in enumerate(diff_list):
    print(n)
    if "predicted_serotype" in line and "predicted_serotype" in diff_list[n+1]:
        print(line)

基本上diff_list是一个列表，因此您可以执行各种索引操作。

以及评论中提到的其他内容，请确保 n + 1

没有超出范围

更新 @bruno desthuilliers建议：

for line, next_line in zip(diff_list, diff_list[1:]):
    if "predicted_serotype" in line and "predicted_serotype" in next_line:
        print(line)

这样可以避免索引错误

Answer 2

尽管我的回答与字面上的问题无关，但考虑到问题的背景，我建议使用如下正则表达式。

regx.findall

这里，正则表达式匹配包含字符串“ predicted_serotype”的两行，但是is.valid.timezone <- function(timezone) { #test the timezones are valid tryCatch( { strptime('2019-01-01', "%Y-%m-%d", tz=timezone) return(T) }, warning=function(w) { return(F) }, error=function(e) { return(F) })仅返回括号内的捕获组。

如果字符串在行和下一行

2 个答案: