R中的readPDF(tm包)

时间:2013-07-10 06:32:48

标签: r cygwin tm

我尝试在R中阅读一些在线pdf文档。我使用了readRDF函数。我的脚本就像这样

safex <- readPDF(PdftotextOptions='-layout')(elem=list(uri='C:/Users/FCG/Desktop/NoteF7000.pdf'),language='en',id='id1')

R显示运行命令状态为309的消息。我尝试了不同的pdftotext选项。但是,这是相同的信息。并且创建的文本文件没有内容。

任何人都可以read this pdf

1 个答案:

答案 0 :(得分:3)

readPDFbugs,可能不值得打扰(请查看this well-documented struggle)。

假设......

  1. 您已安装xpdf(有关详细信息,请参阅here

  2. 您的PATH全部有序(有关如何操作的详细信息,请参阅here),并且您已重新启动计算机。

  3. 然后你最好避免使用readPDF,而是使用这种解决方法:

    system(paste('"C:/Program Files/xpdf/pdftotext.exe"', 
                 '"C:/Users/FCG/Desktop/NoteF7000.pdf"'), wait=FALSE)
    

    然后将文本文件读入R,就像这样......

    require(tm)
    mycorpus <- Corpus(URISource("C:/Users/FCG/Desktop/NoteF7001.txt"))
    

    并确认一切顺利:

    inspect(mycorpus)
    
    A corpus with 1 text document
    
    The metadata consists of 2 tag-value pairs and a data frame
    Available tags are:
      create_date creator 
    Available variables in the data frame are:
      MetaID 
    
    [[1]]
    Market Notice
    Number: Date F7001 08 May 2013
    
    New IDX SSF (EWJG) The following new IDX SSF contract will be added to the list and will be available for trade today.
    
    Summary Contract Specifications Contract Code Underlying Instrument Bloomberg Code ISIN Code EWJG EWJG IShares MSCI Japan Index Fund (US) EWJ US EQUITY US4642868487 1 (R1 per point)
    
    Contract Size / Nominal
    
    Expiry Dates & Times
    
    10am New York Time; 14 Jun 2013 / 16 Sep 2013
    
    Underlying Currency Quotations Minimum Price Movement (ZAR) Underlying Reference Price
    
    USD/ZAR Bloomberg Code (USDZAR Currency) Price per underlying share to two decimals. R0.01 (0.01 in the share price)
    
    4pm underlying spot level as captured by the JSE.
    
    Currency Reference Price
    
    The same method as the one utilized for the expiry of standard currency futures on standard quarterly SAFEX expiry dates.
    
    JSE Limited Registration Number: 2005/022939/06 One Exchange Square, Gwen Lane, Sandown, South Africa. Private Bag X991174, Sandton, 2146, South Africa. Telephone: +27 11 520 7000, Facsimile: +27 11 520 8584, www.jse.co.za
    
    Executive Director: NF Newton-King (CEO), A Takoordeen (CFO) Non-Executive Directors: HJ Borkum (Chairman), AD Botha, MR Johnston, DM Lawrence, A Mazwai, Dr. MA Matooane , NP Mnxasana, NS Nematswerani, N Nyembezi-Heita, N Payne Alternate Directors: JH Burke, LV Parsons
    
    Member of the World Federation of Exchanges
    
    Company Secretary: GC Clarke
    Settlement Method
    
    Cash Settled
    
    -
    
    Clearing House Fees -
    
    On-screen IDX Futures Trading: o 1 BP for Taker (Aggressor) o Zero Booking Fees for Maker (Passive) o No Cap o Floor of 0.01 Reported IDX Futures Trades o 1.75 BP for both buyer and seller o No Cap o Floor of 0.01
    
    Initial Margin Class Spread Margin V.S.R. Expiry Date
    
    R 10.00 R 5.00 3.5 14/06/2013, 16/09/2013
    
    The above instrument has been designated as "Foreign" by the South African Reserve Bank
    
    Should you have any queries regarding IDX Single Stock Futures, please contact the IDX team on 011 520-7399 or idx@jse.co.za
    
    Graham Smale Director: Bonds and Financial Derivatives Tel: +27 11 520 7831 Fax:+27 11 520 8831 E-mail: grahams@jse.co.za
    
    Distributed by the Company Secretariat +27 11 520 7346
    
    Page 2 of 2