当我将它作为jar文件运行时,NullPointer异常。但在eclipse中工作正常

时间:2016-03-25 14:34:12

标签: java hadoop nullpointerexception

我必须从pubmed数据库下载摘要并将其存储在HDFS中。当我在Eclipse中运行此代码时,我得到了输出。但是当我通过在jar中提取项目来在Hadoop上运行它时,它会给出NullPointer异常。

以下是代码:

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class DwnloadAbstracts
{
    public static void saveUrlToFile(File saveFile,String diseaseName){



        try {
            int retstart=0;
            int total=20;
            int retmax=10;
            //FileWriter fw=new FileWriter(saveFile);
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(conf);
            Path outFile = new Path(saveFile.toString());
            if (fs.exists(outFile))
            {
                fs.delete(outFile);
            }
            FSDataOutputStream out = fs.create(outFile);
            for(retstart=0;retstart<total;retstart+=retmax)
            {
            URL url = new URL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="+diseaseName+"&retstart="+retstart+"&retmax="+retmax+"&usehistory=y");
            URLConnection conn1 = url.openConnection();
            DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
            try {
                DocumentBuilder builder = domFactory.newDocumentBuilder();
                Document dDoc = builder.parse(conn1.getInputStream());                
                XPath xPath = XPathFactory.newInstance().newXPath();
                /*Node node1 = (Node) xPath.compile("/eSearchResult/WebEnv").evaluate(dDoc, XPathConstants.NODE);
                String webEnv=node1.getFirstChild().getNodeValue();
                Node node2 = (Node) xPath.compile("/eSearchResult/QueryKey").evaluate(dDoc, XPathConstants.NODE);
                String query_key=node2.getFirstChild().getNodeValue();
                System.out.println(webEnv +"\n"+ query_key);
                URL url2=new URL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&query_key="+query_key+"&WebEnv="+webEnv+"&retmode=xml");*/

                NodeList nodeList=(NodeList) xPath.compile("/eSearchResult/IdList/Id").evaluate(dDoc, XPathConstants.NODESET);
                String idList=nodeList.item(0).getFirstChild().getNodeValue();
                for (int i = 1; i < nodeList.getLength(); i++) {
                    idList=idList+","+nodeList.item(i).getFirstChild().getNodeValue(); 
                }
                URL url2=new URL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="+idList+"&retmode=xml");     

                URLConnection conn2 = url2.openConnection();
                DocumentBuilderFactory domFactory2 = DocumentBuilderFactory.newInstance();
                DocumentBuilder builder2 = domFactory.newDocumentBuilder();
                Document dDoc2 = builder.parse(conn2.getInputStream());
                NodeList abstractList=(NodeList) xPath.compile("/PubmedArticleSet/PubmedArticle/MedlineCitation/Article/Abstract/AbstractText").evaluate(dDoc2, XPathConstants.NODESET);
                NodeList pmIdList=(NodeList)xPath.compile("/PubmedArticleSet/PubmedArticle/MedlineCitation/PMID").evaluate(dDoc2, XPathConstants.NODESET);
                for (int i = 0; i < pmIdList.getLength(); i++) {
                     //fw.append("PMID:");
                     out.write((pmIdList.item(i).getFirstChild().getNodeValue()+"\t").getBytes());
                     //fw.append("Abstract:");
                     out.write((abstractList.item(i).getFirstChild().getNodeValue()+"\n").getBytes());

                }

                /* TransformerFactory tfactory = TransformerFactory.newInstance();
                Transformer xform = tfactory.newTransformer();
                xform.transform(new DOMSource(dDoc2), new StreamResult(saveFile));
                */
                //System.out.println("Done");
            } catch (Exception e) {
                e.printStackTrace();
            }
            }
            out.close();
        } catch (MalformedURLException e) {
            e.printStackTrace();
        }

        catch (IOException e) {
            e.printStackTrace();
        }


}
    public static void main(String[] args) throws Exception{
        DwnloadAbstracts.saveUrlToFile(new File(args[1]), args[0]);
    }

}

在Eclipse中创建的输出文件:

27011331    Pneumonia is the leading infectious cause of mortality in children under five worldwide. Community-level interventions, such as integrated community case management, have great potential to reduce the burden of pneumonia, as well as other diseases, especially in remote populations. However, there are still questions as to whether community health workers (CHW) are able to accurately assess symptoms of pneumonia and prescribe appropriate treatment. This research addresses limitations of previous studies using innovative methodology to assess the accuracy of respiratory rate measurement by CHWs and provides new evidence on the quality of care given for children with symptoms of pneumonia. It is one of few that assesses CHW performance in their usual setting, with independent re-examination by experts, following a considerable period of time post-training of CHWs.
27011010    In this cross-sectional mixed methods study, 1,497 CHW consultations, conducted by 90 CHWs in two districts of Luapula province, Zambia, were directly observed, with measurement of respiratory rate for children with suspected pneumonia recorded by video. Using the video footage, a retrospective reference standard assessment of respiratory rate was conducted by experts. Counts taken by CHWs were compared against the reference standard and appropriateness of the treatment prescribed by CHWs was assessed. To supplement observational findings, three focus group discussions and nine in depth interviews with CHWs were conducted.
27010926    The findings support existing literature that CHWs are capable of measuring respiratory rates and providing appropriate treatment, with 81% and 78% agreement, respectively, between CHWs and experts. Accuracy in diagnosis could be strengthened through further training and the development of improved diagnostic tools appropriate for resource-poor settings.
27010649    Plasmodium vivax is the most widely distributed human parasite and the main cause of human malaria outside the African continent. However, the knowledge about the genetic variability of P. vivax is limited when compared to the information available for P. falciparum. We present the results of a study aimed at characterizing the genetic structure of P. vivax populations obtained from pregnant women from different malaria endemic settings. Between June 2008 and October 2011 nearly 2000 pregnant women were recruited during routine antenatal care at each site and followed up until delivery. A capillary blood sample from the study participants was collected for genotyping at different time points. Seven P. vivax microsatellite markers were used for genotypic characterization on a total of 229 P. vivax isolates obtained from Brazil, Colombia, India and Papua New Guinea. In each population, the number of alleles per locus, the expected heterozygosity and the levels of multilocus linkage disequilibrium were assessed. The extent of genetic differentiation among populations was also estimated. Six microsatellite loci on 137 P. falciparum isolates from three countries were screened for comparison. The mean value of expected heterozygosity per country ranged from 0.839 to 0.874 for P. vivax and from 0.578 to 0.758 for P. falciparum. P. vivax populations were more diverse than those of P. falciparum. In some of the studied countries, the diversity of P. vivax population was very high compared to the respective level of endemicity. The level of inter-population differentiation was moderate to high in all P. vivax and P. falciparum populations studied.
27010542    The development of new efficient therapeutics for the treatment of malaria and cancer is an important endeavor. Over the past 15 years, much attention has been paid to the synthesis of dimeric structures, which combine two units of artemisinin, as lead compounds of interest. A wide variety of atemisinin-derived dimers containing different linkers demonstrate improved properties compared to their parent compounds (e.g. circumventing multidrug resistance), making the dimerization concept highly compelling for development of efficient antimalarial and anti cancer drugs. The present Perspective highlights recent developments on different types of artemisinin-derived dimers and their structural and functional features. Particular emphasis is put on the respective in vitro and in vivo studies, exploring the role of the length and nature of linkers on the activities of the dimers, and considering the future prospects of the dimerization concept for drug discovery.
27009943    During the recent past, development of DDT resistance and reduction to pyrethroid susceptibility among the malaria vectors has posed a serious challenge in many Southeast Asian countries including India. Current study presents the insecticide susceptibility and knock-down data of field collected Anopheles annularis sensu lato and An. vagus mosquito species from endemic areas of Assam in northeast India. Anopheles annularis s.l. and An. vagus adult females were collected from four randomly selected sentinel sites in Orang primary health centre (OPHC) and Balipara primary health centre (BPHC) areas, and used for testing susceptibility to DDT, malathion, deltamethrin and lambda-cyhalothrin. After insecticide susceptibility tests, mosquitoes were subjected to VectorTest™ assay kits to detect the presence of malaria sporozoite in the mosquitoes. An. annularis s.l. was completely susceptible to deltamethrin, lambda-cyhalothrin and malathion in both the study areas. An. vagus was highly susceptible to deltamethrin in both the areas, but exhibited reduced susceptibility to lambda-cyhalothrin in BPHC. Both the species were resistant to DDT and showed very high KDT50 and KDT99 values for DDT. Probit model used to calculate the KDT50 and KDT99 values did not display normal distribution of percent knock-down with time for malathion in both the mosquito species in OPHC (p<0.05) and An. vagus in BPHC (χ2 = 25.3; p = 0.0), and also for deltamethrin to An. vagus in BPHC area (χ2 = 15.4; p = 0.004). Minimum infection rate (MIR) of Plasmodium sporozoite for An. vagus was 0.56 in OPHC and 0.13 in BPHC, while for An. annularis MIR was found to be 0.22 in OPHC. Resistance management strategies should be identified to delay the expansion of resistance. Testing of field caught Anopheles vectors from different endemic areas for the presence of malaria sporozoite may be useful to ensure their role in malaria transmission.
27009571    Primaquine is the only drug consistently effective against mature gametocytes of Plasmodium falciparum. The transmission blocking dose of primaquine previously recommended was 0.75mg/kg (adult dose 45mg) but its deployment was limited because of concerns over haemolytic effects in patients with glucose-6-phosphate dehydrogenase (G6PD) deficiency. G6PD deficiency is an inherited X-linked enzymatic defect that affects an estimated 400 million people around the world with high frequencies (15-20%) in populations living in malarious areas. To reduce transmission in low transmission settings and facilitate elimination of P. falciparum, the World Health Organization now recommends adding a single dose of 0.25mg/kg (adult dose 15mg) to Artemisinin-based Combination Therapies (ACTs) without G6PD testing. Direct evidence of the safety of this low dose is lacking. Adverse events and haemoglobin variations after this treatment were assessed in both G6PD normal and deficient subjects in the context of targeted malaria elimination in a malaria endemic area on the North-Western Myanmar-Thailand border where prevalence of G6PD deficiency (Mahidol variant) approximates 15%.
27009093    The tolerability and safety of primaquine (single dose 0.25 mg base/kg) combined with dihydroartemisinin-piperaquine (DHA-PPQ) given three times at monthly intervals was assessed in 819 subjects. Haemoglobin concentrations were estimated over the six months preceding the ACT + primaquine rounds of mass drug administration. G6PD deficiency was assessed with a phenotypic test and genotyping was performed in male subjects with deficient phenotypes and in all females. Fractional haemoglobin changes in relation to G6PD phenotype and genotype and primaquine round were assessed using linear mixed-effects models. No adverse events related to primaquine were reported during the trial. Mean fractional haemoglobin changes after each primaquine treatment in G6PD deficient subjects (-5.0%, -4.2% and -4.7%) were greater than in G6PD normal subjects (0.3%, -0.8 and -1.7%) but were clinically insignificant. Fractional drops in haemoglobin concentration larger than 25% following single dose primaquine were observed in 1.8% of the population but were asymptomatic.
27008882    The single low dose (0.25mg/kg) of primaquine is clinically well tolerated and can be used safely without prior G6PD testing in populations with high prevalence of G6PD deficiency. The present evidence supports a broader use of low dose primaquine without G6PD testing for the treatment and elimination of falciparum malaria.
27008340    ClinicalTrials.gov NCT01872702.
27007559    In Iran, both Plasmodium vivax and P. falciparum malaria have been detected, but P. vivax is the predominant species. Point mutations in dihydrofolate reductase (dhfr) gene in both Plasmodia are the major mechanisms of pyrimethamine resistance. From April 2007 to June 2009, a total of 134 blood samples in two endemic areas of southern Iran were collected from patients infected with P. vivax and P. falciparum. The isolates were analyzed for P. vivax dihydrofolate reductase (pvdhfr) and P. falciparum dihydrofolate reductase (pfdhfr) point mutations using various PCR-based methods. The majority of the isolates (72.9%) had wild type amino acids at five codons of pvdhfr. Amongst mutant isolates, the most common pvdhfr alleles were double mutant in 58 and 117 amino acids (58R-117N). Triple mutation in 57, 58, and 117 amino acids (57L/58R/117N) was identified for the first time in the pvdhfr gene of Iranian P. vivax isolates. All the P. falciparumsamples analyzed (n = 16) possessed a double mutant pfdhfrallele (59R/108N) and retained a wild-type mutation at position 51. This may be attributed to the fact that the falciparum malaria patients were treated using sulfadoxine-pyrimethamine (SP) in Iran. The presence of mutant haplotypes in P. vivax is worrying, but has not yet reached an alarming threshold regarding drugs such as SP. The results of this study reinforce the importance of performing a molecular surveillance by means of a continuous chemoresistance assessment.
27007512    Baculovirus vector (BV) is able to transduce foreign genes into mammalian cells efficiently and safely by incorporating a mammalian promoter protein. In this study, we tailored the surface proteins expressed by malaria sporozoites to enhance hepatocyte transduction. Sporozoites infect hepatocytes within minutes of initial entry into the blood circulation. Infectivity and hepatocyte-specific selectivity are mediated by the interplay between hepatocytes and sporozoite surface proteins. The circumsporozoite protein (CSP) and the thrombospondin related anonymous protein (TRAP) bind to the heparin sulfate proteoglycan on the hepatocyte surface, and these contribute to sporozoite infection and hepatocyte selectivity.
27006963    BVs displaying an ectodomain consisting of three different CSP variants (full-length, N-terminal, and C-terminal) or TRAP on the virus envelope were constructed, and the resulting in vitro hepatocyte transduction efficiency was evaluated.
27006665    We demonstrated improved hepatocyte transduction efficiency in BVs expressing CSP or TRAP ectodomains as compared to BVs without malaria surface proteins. In addition, gene transduction efficiencies for BVs displaying CSP or TRAP are higher than those expressing the preS1 antigen of the Hepatitis B virus.
27006284    BVs expressing CSP or TRAP in the ectodomain could represent a promising hepatocyte-specific gene delivery methodology.
27006074    Plasmodium knowlesi has been identified in the last decade as a fifth species causing malaria in areas of South East Asia. Due to its short erythrocytic cycle, rapid development of high parasitemia and severe manifestations are frequently observed. Therefore, prompt diagnosis of infection is essential to prevent complications, but the low sensitivity of rapid diagnostic tests for P knowlesi pose a diagnostic challenge in acute settings. In this study, we report the case of a German traveler to Thailand, who was treated for P knowlesi malaria after returning to Germany. Rapid antigen test for malaria was negative on presentation. Diagnosis of a nonfalciparum malaria was made based on microscopy, and species definition was determined using polymerase chain reaction technique.
27005280    Most prescribers and patients in Ghana now opt for the relatively expensive artemether/lumefantrine rather than artesunate-amodiaquine due to undesirable side effects in the treatment of uncomplicated malaria. The study sought to determine the existence of substandard and/or counterfeit artemether-lumefantrine tablets and suspension as well as artemether injection on the market in Cape Coast. Six brands of artemether-lumefantrine tablets, two brands of artemether-lumefantrine suspensions, and two brands of artemether injections were purchased from pharmacies in Cape Coast for the study. The mechanical properties of the tablets were evaluated. The samples were then analyzed for the content of active ingredients using High Performance Liquid Chromatography with a variable wavelength detector. None of the samples was found to be counterfeit. However, the artemether content of the samples was variable (93.22%-104.70% of stated content by manufacturer). The lumefantrine content of the artemether/lumefantrine samples was also variable (98.70%-111.87%). Seven of the artemether-lumefantrine brands passed whilst one failed the International Pharmacopoeia content requirements. All brands of artemether injections sampled met the International Pharmacopoeia content requirement. The presence of a substandard artemether-lumefantrine suspension in the market should alert regulatory bodies to be more vigilant and totally flush out counterfeit and substandard drugs from the Ghanaian market.
27004586    Calcium (Ca(2+))-mediated signaling is a conserved mechanism in eukaryotes, including the human malaria parasite, Plasmodium falciparum. Due to its small size (<10 μm) measurement of intracellular Ca(2+) in Plasmodium is technically challenging, and thus Ca(2+) regulation in this human pathogen is not well understood. Here we analyze Ca(2+) homeostasis via a new approach using transgenic P. falciparum expressing the Ca(2+) sensor yellow cameleon (YC)-Nano. We found that cytosolic Ca(2+) concentration is maintained at low levels only during the intraerythrocytic trophozoite stage (30 nM), and is increased in the other blood stages (>300 nM). We determined that the mammalian SERCA inhibitor thapsigargin and antimalarial dihydroartemisinin did not perturb SERCA activity. The change of the cytosolic Ca(2+) level in P. falciparum was additionally detectable by flow cytometry. Thus, we propose that the developed YC-Nano-based system is useful to study Ca(2+) signaling in P. falciparum and is applicable for drug screening.
27004583    The growing threat of insecticide resistance in mosquitoes and drug resistance in the Plasmodium parasites increases the importance of ensuring appropriate malaria case management and enabling positive health-seeking behaviour. Treatment-seeking behaviours are poorly characterized in malaria-endemic regions that have been the focus of intensive control and elimination campaigns. This study uses a comprehensive approach to shed light on the determinants of malaria treatment-seeking behaviours from different perspectives.
27004580    The authors conducted cross-sectional surveys from 832 households, fifteen health centers, and 135 retailers across three sites in the Emuhaya and Kakamega districts of the western Kenyan highlands. Participants were recruited via random sampling and data were collected with the use of a structured questionnaire about malaria treatment-seeking behaviour. All households, healthcare facilities, and retailers were mapped using a handheld GPS and a GIS algorithm was used to calculate "walk distance" based on the Tobler rule; an estimate of this distance was used to calculate the travel time used in the analyses.

我按照以下方式在Hadoop上运行jar:

hadoopuser@master:/usr/local/hadoop/bin$ ./hadoop jar /home/hadoopuser/Desktop/DownloadAbstracts.jar DwnloadAbstracts malaria /malariaAbstracts

终端输出如下:

16/03/25 19:05:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
java.lang.NullPointerException
    at DwnloadAbstracts.saveUrlToFile(DwnloadAbstracts.java:56)
    at DwnloadAbstracts.main(DwnloadAbstracts.java:97)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
java.lang.NullPointerException
    at DwnloadAbstracts.saveUrlToFile(DwnloadAbstracts.java:56)
    at DwnloadAbstracts.main(DwnloadAbstracts.java:97)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

1 个答案:

答案 0 :(得分:0)

您的代码在Eclipse本地运行正常的原因是因为正在从本地磁盘读取文件。正在显示NullPointerException,因为当代码在Hadoop中时,无法从本地磁盘读取文件。

使用hadoop jar运行JAR文件时,您作为输入提供的任何文件都应驻留在HDFS上。您编写的任何输出也将放在HDFS上。

假设文件malaria在本地目录中,您可以将文件放在HDFS上。

hadoop fs -mkdir -p /tmp/input/diseases
hadoop fs -put malaria /tmp/input/diseases/

# wherever you want to store the output
hadoop fs -mkdir -p /tmp/output 

将文件放在HDFS上时,请确保您有权读取和写入该目录。对于此示例,您可以使用/tmp

如果您愿意,您还可以将许多不同的疾病文件放入该目录中。

然后您应该能够运行以下内容。

hadoop jar DownloadAbstracts.jar DownloadAbstracts /tmp/input/diseases /tmp/output/diseaseAbstracts

如果您只想定位一个文件,请相应地更改路径。

输出将放在HDFS上的/tmp/output/diseaseAbstracts中。