在表示元素为空白时修剪向量空白应为NA

时间:2019-01-16 03:16:15

标签: r string tidyr data-cleaning

我正在处理由于网页抓取而导致的数据清理问题。这是一个极端的情况,但这是初始表的示例:

""                                                      ""                                                     
 [3] ""                                                      ""                                                     
 [5] ""                                                      "Fund Management"                                      
 [7] "Fund SponsorMassachusetts Financial Services"          "Portfolio ManagersGeoffrey L. Schechter (30 Dec 2004)"
 [9] ""                                                      ""                                                     
[11] "Basics"                                                ""                                                     
[13] "Category:"                                             "Tax-Free Income-High Yield"                           
[15] "Ticker:"                                               "MFM       "                                           
[17] "NAV Ticker:"                                           "XMFMX"                                                
[19] "Average Daily Volume (shares):"                        ""                                                     
[21] "Average Daily Volume (USD):"                           "M"                                                    
[23] "Inception Date:"                                       "11/25/1986"                                           
[25] "Inception Share Price:"                                "$10.00"                                               
[27] "Inception NAV:"                                        "$9.40"                                                
[29] "Tender Offer:"                                         "No"                                                   
[31] "Term:"                                                 "No"                                                   
[33] "Fiscal Year End:"                                      "October 31"                                           
[35] "Third Party Links & Reports"                           ""                                                     
[37] "SEC Filings"                                           "Intraday Pricing"                                     
[39] "Fund Sponsor Website "                                 ""                                                     
[41] ""                                                      ""                                                     
[43] ""                                                      ""                                                     
[45] ""                                                      ""        

问题是我希望能够修剪表格边框上的空白(即元素1-4和40-45 BUT,如果元素后面带有空格和“:”的元素中有空格)我正在尝试以动态方式准备此向量,以便在网页格式略有变化的情况下不需要持续监视,谢谢。 / p>

请参见dput:

c(“”,“”,“”,“”,“”,“基金管理”,“基金发起人马萨诸塞州金融服务”, “投资组合经理,Geoffrey L. Schechter(2004年12月30日)”,“”, “”,“基本知识”,“”,“类别:”,“免税收入-高收益率”, “股票代号:”,“ MFM”,“ NAV股票代号:”,“ XMFMX”,“平均每日交易量(股票):”, “”,“平均每日交易量(USD):”,“ M”,“开始日期:”,“ 11/25/1986”, “初始股票价格:”,“ $ 10.00”,“初始资产净值:”,“ $ 9.40”, “投标要约:”,“否”,“期限”,“否”,“会计年度结束时间”,“ 10月31日”, “第三方链接和报告”,“”,“ SEC文件”,“当日定价”, “基金赞助商网站”,“”,“”,“”,“”,“”,“”,“”)

但是,向量是字符向量,而不是列表。

0 个答案:

没有答案