我正在处理由于网页抓取而导致的数据清理问题。这是一个极端的情况,但这是初始表的示例:
"" ""
[3] "" ""
[5] "" "Fund Management"
[7] "Fund SponsorMassachusetts Financial Services" "Portfolio ManagersGeoffrey L. Schechter (30 Dec 2004)"
[9] "" ""
[11] "Basics" ""
[13] "Category:" "Tax-Free Income-High Yield"
[15] "Ticker:" "MFM "
[17] "NAV Ticker:" "XMFMX"
[19] "Average Daily Volume (shares):" ""
[21] "Average Daily Volume (USD):" "M"
[23] "Inception Date:" "11/25/1986"
[25] "Inception Share Price:" "$10.00"
[27] "Inception NAV:" "$9.40"
[29] "Tender Offer:" "No"
[31] "Term:" "No"
[33] "Fiscal Year End:" "October 31"
[35] "Third Party Links & Reports" ""
[37] "SEC Filings" "Intraday Pricing"
[39] "Fund Sponsor Website " ""
[41] "" ""
[43] "" ""
[45] "" ""
问题是我希望能够修剪表格边框上的空白(即元素1-4和40-45 BUT,如果元素后面带有空格和“:”的元素中有空格)我正在尝试以动态方式准备此向量,以便在网页格式略有变化的情况下不需要持续监视,谢谢。 / p>
请参见dput:
c(“”,“”,“”,“”,“”,“基金管理”,“基金发起人马萨诸塞州金融服务”, “投资组合经理,Geoffrey L. Schechter(2004年12月30日)”,“”, “”,“基本知识”,“”,“类别:”,“免税收入-高收益率”, “股票代号:”,“ MFM”,“ NAV股票代号:”,“ XMFMX”,“平均每日交易量(股票):”, “”,“平均每日交易量(USD):”,“ M”,“开始日期:”,“ 11/25/1986”, “初始股票价格:”,“ $ 10.00”,“初始资产净值:”,“ $ 9.40”, “投标要约:”,“否”,“期限”,“否”,“会计年度结束时间”,“ 10月31日”, “第三方链接和报告”,“”,“ SEC文件”,“当日定价”, “基金赞助商网站”,“”,“”,“”,“”,“”,“”,“”)
但是,向量是字符向量,而不是列表。