从PubMed那里得到了ID

时间:2018-05-10 19:09:15

标签: python regex performance pubmed

我目前正致力于寻找PubMed / MEDLINE引用与临床试验注册之间的直接联系。具体而言,给定单个PMID,我希望在任何临床试验注册表中找到引用的所有ID。 (例如,请参阅标识为29593018

的PMID ACTRN12616000470493

目前,我只使用以下正则表达式搜索ClinicalTrials.gov的链接(id:NCT后跟8位数字(例如NCT01435343)):

attributes = {'mdTitle': 'High-dose versus standard-dose amoxicillin/clavulanate for clinically-diagnosed acute bacterial sinusitis: A randomized clinical trial.', 'mdAbstract': 'BACKGROUND: The recommended treatment for acute bacterial sinusitis in adults, amoxicillin with clavulanate, provides only modest benefit. OBJECTIVE: To see if a higher dose of amoxicillin will lead to more rapid improvement. DESIGN, SETTING, AND PARTICIPANTS: Double-blind randomized trial in which, from November 2014 through February 2017, we enrolled 315 adult outpatients diagnosed with acute sinusitis in accordance with Infectious Disease Society of America guidelines. INTERVENTIONS: Standard-dose (SD) immediate-release (IR) amoxicillin/clavulanate 875 /125 mg (n = 159) vs. high-dose (HD) (n = 156). The original HD formulation, 2000 mg of extended-release (ER) amoxicillin with 125 mg of IR clavulanate twice a day, became unavailable half way through the study. The IRB then approved a revised protocol after patient 180 to provide 1750 mg of IR amoxicillin twice a day in the HD formulation and to compare Time Period 1 (ER) with Time Period 2 (IR). MAIN MEASURE: The primary outcome was the percentage in each group reporting a major improvement-defined as a global assessment of sinusitis symptoms as "a lot better" or "no symptoms"-after 3 days of treatment. KEY RESULTS: Major improvement after 3 days was reported during Period 1 by 38.8% of ER HD versus 37.9% of SD patients (P = 0.91) and during Period 2 by 52.4% of IR HD versus 34.4% of SD patients, an effect size of 18% (95% CI 0.75 to 35%, P = 0.04). No significant differences in efficacy were seen at Day 10. The major side effect, severe diarrhea at Day 3, was reported during Period 1 by 7.4% of HD and 5.7% of SD patients (P = 0.66) and during Period 2 by 15.8% of HD and 4.8% of SD patients (P = 0.048). CONCLUSIONS: Adults with clinically diagnosed acute bacterial sinusitis were more likely to improve rapidly when treated with IR HD than with SD but not when treated with ER HD. They were also more likely to suffer severe diarrhea. Further study is needed to confirm these findings. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT02340000.', 'mdMesh': '', 'mdPMID': '29738561', 'mdPublicationType': ['Journal Article'], 'mdAuthor': ['Matho A', 'Mulqueen M', 'Tanino M', 'Quidort A', 'Cheung J', 'Pollard J', 'Rodriguez J', 'Swamy S', 'Tayler B', 'Garrison G', 'Ata A', 'Sorum P'], 'mdDataPublished': '2018', 'mdPMC': '', 'mdSI': ['ClinicalTrials.gov/NCT02340000'], 'mdAID': ['10.1371/journal.pone.0196734 [doi]', 'PONE-D-17-43190 [pii]'], 'mdDOI': ['10.1371/journal.pone.0196734 [doi]', 'PONE-D-17-43190 [pii]'], 'mdSO': 'PLoS One. 2018 May 8;13(5):e0196734. doi: 10.1371/journal.pone.0196734. eCollection 2018.', 'mdLanguage': ['English']}

dictString = ', '.join("{!s}={!r}".format(key,val) for (key,val) in attributes.items())
for each in dictString.split(' '):
    if re.match(r'(NCT)\d{8}', each):
        print (each.strip('.\','))

但是,PubMed / MEDLINE还包含40 other clinical trial registration ID's。我也希望得到这些ID。我怎么能比写40多个正则表达式语句更有效呢?

注意:为了澄清,我需要识别每个ID和每个ID的正文。 (即ClinicalTrials.Gov for NCT01435343或Australian New Zealand Clinical Trials Registry for ACTRN12616000470493)

1 个答案:

答案 0 :(得分:1)

我还没看过一堆知道相同的模式是否适用,但是如果他们总是遵循说“" TRIAL REGISTRATION NUMBER:"在html CompletionListener/SendAcknowledgementHandler used with confirmationWindowSize=-1. Enable confirmationWindowSize to receive acks from server! 标记内,您可以解析包含此术语的@RequestScoped public class Sender { @Resource(lookup = "java:jboss/exported/jms/RemoteConnectionFactory") private ConnectionFactory connectionFactory; @Resource(lookup = "java:jboss/jms/queue/bookstoreqeue") private Queue queue; @Resource(lookup = "java:jboss/jms/messagetopic") private Topic topic; public void send() throws JMSException { Connection connection = connectionFactory.createConnection("admin", "admin"); Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE); MessageProducer messageProducer = session.createProducer(topic); TextMessage message = session.createTextMessage("message"); MessageConsumer consumer = session.createConsumer(topic); consumer.setMessageListener(msg -> System.out.println("Received!")); while (true) { try { TimeUnit.SECONDS.sleep(5); messageProducer.send(message, new CompletionListener() { @Override public void onCompletion(Message message) { System.out.println("complete!"); } @Override public void onException(Message message, Exception exception) { System.out.println(exception); } }); } catch (InterruptedException e) { e.printStackTrace(); } } } } 标记的实际html文档,然后从1525974999标记中的以下段落中获取文本。 BeautifulSoup使这一点相对简单。

但同样,你只展示了一个例子。我不知道它是否总是遵循这种模式。从那里它们看起来是分号分隔的,这很容易分开。