将数据下载到特定文件夹

时间:2019-04-02 20:54:13

标签: r

我有一个数据帧,其编号为urls。我正在编写一些代码来告诉R转到url并下载它。但是,我希望井井有条,因此我希望根据收集年份将urls保存到文件夹中。那就是我在数据中有一列称为filing_date_year

因此,如果url是从year 2003收集的,那么我想将url保存在名为2003的文件夹中。但是,如果年份为2010,那么我想将文档保存在名为2010的文件夹中。

########################################################################

我有以下代码:

library(purrr)
walk2(data_information_documents_toget$href.y, data_information_documents_toget$CIKAccNumFileDate_web_extension,
      function(x, y) {
        download.file(x, destfile = paste0("c:/USER/directory/",year_to_filter, "/", y), quiet = FALSE)
      })

从名为data_information_documents_toget的数据帧中提取文档位于url的{​​{1}}。我要下载此href.y并使用唯一的ID名称url

保存

我正在尝试添加条件CIKAccNumFileDate_web_extension,该条件本质上将是表明是否将year_to_filter从带有年份url的行中提取的索引,然后将其保存在{{ 1}}文件夹等。

样本数据:

2003

编辑:

如果数据名为2003,并且目录如下data_information_documents_toget <- structure(list(href.y = c("https://www.sec.gov/Archives/edgar/data/1578845/000156459019003111/agn-10k_20181231.htm", "https://www.sec.gov/Archives/edgar/data/81033/000093041308001260/c52299_10k.htm", "https://www.sec.gov/Archives/edgar/data/704051/000070405115000045/lm_10kx3312015.htm", "https://www.sec.gov/Archives/edgar/data/5133/000119312513209085/d460905d10k.htm", "https://www.sec.gov/Archives/edgar/data/915912/000095012310019013/w77522e10vk.htm", "https://www.sec.gov/Archives/edgar/data/823768/000095012311015242/h76657e10vk.htm", "https://www.sec.gov/Archives/edgar/data/12978/000104746905006771/a2153651z10-k.htm", "https://www.sec.gov/Archives/edgar/data/12659/000095013707009521/c16312e10vk.htm", "https://www.sec.gov/Archives/edgar/data/941548/000095012904001055/h13049e10vk.htm", "https://www.sec.gov/Archives/edgar/data/1800/000104746913001180/a2212523z10-k.htm", "https://www.sec.gov/Archives/edgar/data/1004155/000100415506000097/form10ka.htm", "https://www.sec.gov/Archives/edgar/data/5272/000000527215000002/maindocument001.htm", "https://www.sec.gov/Archives/edgar/data/1308161/000156459018021493/fox-10k_20180630.htm", "https://www.sec.gov/Archives/edgar/data/915389/000091538917000014/emn2016123110k.htm", "https://www.sec.gov/Archives/edgar/data/1326380/000132638015000078/form10k-fy14.htm", "https://www.sec.gov/Archives/edgar/data/85408/000095012907001047/h43875e10vk.htm", "https://www.sec.gov/Archives/edgar/data/1224608/000122460816000053/cno1231201510-k.htm", "https://www.sec.gov/Archives/edgar/data/836106/000089161804000704/f95884e10vk.htm", "https://www.sec.gov/Archives/edgar/data/1040971/000110465905011116/a05-4733_110k.htm", "https://www.sec.gov/Archives/edgar/data/909832/000119312505223245/d10k.htm", "https://www.sec.gov/Archives/edgar/data/723254/000110465906053974/a06-16851_110k.htm", "https://www.sec.gov/Archives/edgar/data/1037038/000103703815000006/rl-20150328x10k.htm", "https://www.sec.gov/Archives/edgar/data/1113169/000095013308000389/w47962e10vk.htm", "https://www.sec.gov/Archives/edgar/data/808450/000119312509257118/d10k.htm", "https://www.sec.gov/Archives/edgar/data/909832/000119312511271844/d203874d10k.htm", "https://www.sec.gov/Archives/edgar/data/319201/000144530511002394/klac10k2011.htm", "https://www.sec.gov/Archives/edgar/data/915912/000091591218000004/a201710-k.htm", "https://www.sec.gov/Archives/edgar/data/95304/000095010903001224/d10k.htm", "https://www.sec.gov/Archives/edgar/data/3153/000009212211000013/g24641xxe10vk.htm", "https://www.sec.gov/Archives/edgar/data/12659/000095013706004022/c03876e10vkza.htm", "https://www.sec.gov/Archives/edgar/data/63541/000119312506027038/d10k.htm", "https://www.sec.gov/Archives/edgar/data/1585689/000158568914000006/a2013hwh10-k.htm", "https://www.sec.gov/Archives/edgar/data/1099800/000104746908001956/a2183020z10-k.htm", "https://www.sec.gov/Archives/edgar/data/49196/000095015208001408/l29571ae10vk.htm", "https://www.sec.gov/Archives/edgar/data/1101215/000110121519000048/ads-20181231x10k.htm", "https://www.sec.gov/Archives/edgar/data/1310067/000119312510055594/d10k.htm", "https://www.sec.gov/Archives/edgar/data/1174922/000119312512195995/d340198d10ka.htm", "https://www.sec.gov/Archives/edgar/data/69970/000095015208004633/l32075ae10vkza.htm", "https://www.sec.gov/Archives/edgar/data/5272/000104746914001096/a2218248z10-k.htm", "https://www.sec.gov/Archives/edgar/data/1058090/000105809016000058/cmg-20151231x10k.htm", "https://www.sec.gov/Archives/edgar/data/885639/000088563913000004/kohls_10kx2012.htm", "https://www.sec.gov/Archives/edgar/data/354964/000035496413000002/hbio12311210-k.htm", "https://www.sec.gov/Archives/edgar/data/1075531/000110465911010302/a11-2103_110k.htm", "https://www.sec.gov/Archives/edgar/data/54480/000119312511028728/d10k.htm", "https://www.sec.gov/Archives/edgar/data/1004434/000104746903011288/a2106221z10-k.htm", "https://www.sec.gov/Archives/edgar/data/1526520/000119312514045532/d654086d10k.htm", "https://www.sec.gov/Archives/edgar/data/1310067/000131006715000009/shld201410k.htm", "https://www.sec.gov/Archives/edgar/data/4962/000119312513070554/d486442d10k.htm", "https://www.sec.gov/Archives/edgar/data/354950/000104746907002295/a2176777z10-k.htm", "https://www.sec.gov/Archives/edgar/data/823768/000119312516467957/d83265d10k.htm", "https://www.sec.gov/Archives/edgar/data/50104/000095013409004250/d66470e10vk.htm", "https://www.sec.gov/Archives/edgar/data/1437107/000095013309000442/w72867e10vk.htm", "https://www.sec.gov/Archives/edgar/data/791519/000104746905004527/a2152243z10-k.htm", "https://www.sec.gov/Archives/edgar/data/1136893/000089256908000207/a38312e10vk.htm", "https://www.sec.gov/Archives/edgar/data/1141391/000119312511320907/d258542d10ka.htm", "https://www.sec.gov/Archives/edgar/data/1365135/000136513518000013/wu-12312017x10k.htm", "https://www.sec.gov/Archives/edgar/data/60667/000006066706000141/lowesform10ka02032006.htm", "https://www.sec.gov/Archives/edgar/data/1090727/000119312512081067/d274494d10k.htm", "https://www.sec.gov/Archives/edgar/data/80424/000095015205007351/l15436ae10vk.htm", "https://www.sec.gov/Archives/edgar/data/108772/000010877218000012/xrx-123117x10xk.htm", "https://www.sec.gov/Archives/edgar/data/1075531/000110465904007430/a04-3266_110k.htm", "https://www.sec.gov/Archives/edgar/data/318154/000031815417000004/amgn-12312016x10k.htm", "https://www.sec.gov/Archives/edgar/data/1442145/000095012311019814/y89886e10vk.htm", "https://www.sec.gov/Archives/edgar/data/5513/000000551318000016/unm12312017-10xk.htm", "https://www.sec.gov/Archives/edgar/data/1437107/000143710714000016/disca-2013123110k.htm", "https://www.sec.gov/Archives/edgar/data/1466258/000146625819000073/ir-10kx12312018.htm", "https://www.sec.gov/Archives/edgar/data/50104/000005010417000056/tso201610-k.htm", "https://www.sec.gov/Archives/edgar/data/1166691/000119312506036698/d10k.htm", "https://www.sec.gov/Archives/edgar/data/1141982/000095012311016589/h78025e10vk.htm", "https://www.sec.gov/Archives/edgar/data/37785/000003778517000011/fmc201610k.htm", "https://www.sec.gov/Archives/edgar/data/1040971/000104746909005369/a2192961z10-ka.htm", "https://www.sec.gov/Archives/edgar/data/39911/000119312509066067/d10k.htm", "https://www.sec.gov/Archives/edgar/data/1045810/000104581018000010/nvda-2018x10k.htm", "https://www.sec.gov/Archives/edgar/data/1370946/000137094617000006/oc-20161231x10k.htm", "https://www.sec.gov/Archives/edgar/data/936340/000095012405001542/k91838e10vk.htm", "https://www.sec.gov/Archives/edgar/data/316709/000031670916000067/schw-20151231x10k.htm", "https://www.sec.gov/Archives/edgar/data/25445/000144530514000574/cr-20131231x10k.htm", "https://www.sec.gov/Archives/edgar/data/1336917/000133691718000009/ua-20171231x10k.htm", "https://www.sec.gov/Archives/edgar/data/6281/000095013507007253/b67578ade10vk.htm", "https://www.sec.gov/Archives/edgar/data/879169/000110465907015059/a07-5374_110k.htm", "https://www.sec.gov/Archives/edgar/data/1039684/000103968412000027/form_10-k.htm", "https://www.sec.gov/Archives/edgar/data/31235/000003123511000025/ek2010_10k.htm", "https://www.sec.gov/Archives/edgar/data/1004434/000104746909002123/a2190957z10-k.htm", "https://www.sec.gov/Archives/edgar/data/818479/000081847909000034/q40810k.htm", "https://www.sec.gov/Archives/edgar/data/1121788/000161577419002739/s116041_10k.htm", "https://www.sec.gov/Archives/edgar/data/766704/000095015209002082/l35635ae10vk.htm", "https://www.sec.gov/Archives/edgar/data/29534/000104746913003283/a2213303z10-k.htm", "https://www.sec.gov/Archives/edgar/data/865436/000086543614000161/wfm10k2014.htm", "https://www.sec.gov/Archives/edgar/data/5272/000110465912013132/a11-32502_410ka.htm", "https://www.sec.gov/Archives/edgar/data/931336/000095013403009830/d06474a1e10vkza.htm", "https://www.sec.gov/Archives/edgar/data/1037646/000095012311014519/l41517e10vk.htm", "https://www.sec.gov/Archives/edgar/data/1020569/000110465906017231/a06-2602_110k.htm", "https://www.sec.gov/Archives/edgar/data/1496048/000149604817000018/ggp12311610k.htm", "https://www.sec.gov/Archives/edgar/data/1169055/000162828018002128/noblecorpplc-201710xk.htm", "https://www.sec.gov/Archives/edgar/data/920760/000162828018000562/len-20171130x10k.htm", "https://www.sec.gov/Archives/edgar/data/28917/000002891718000159/dds-02032018x10k.htm", "https://www.sec.gov/Archives/edgar/data/875320/000087532019000006/a201810k-main.htm", "https://www.sec.gov/Archives/edgar/data/1359841/000135984117000040/hbi-20161231x10k.htm", "https://www.sec.gov/Archives/edgar/data/20520/000002052015000011/ftr-20141231x10k.htm", "https://www.sec.gov/Archives/edgar/data/1495569/000119312511040013/d10k.htm" ), CIKAccNumFileDate_web_extension = c("0000054480_0001564590-19-003111_2019-02-15.htm", "0000788784_0000930413-08-001260_2008-02-28.htm", "0001000180_0000704051-15-000045_2015-05-22.htm", "0001094093_0001193125-13-209085_2013-05-09.htm", "0000314808_0000950123-10-019013_2010-03-01.htm", "0000029534_0000950123-11-015242_2011-02-17.htm", "0001585689_0001047469-05-006771_2005-03-16.htm", "0000028917_0000950137-07-009521_2007-06-29.htm", "0000721683_0000950129-04-001055_2004-03-08.htm", "0000001800_0001047469-13-001180_2013-02-15.htm", "0001141982_0001004155-06-000097_2006-06-01.htm", "0001115222_0000005272-15-000002_2015-02-20.htm", "0001272547_0001564590-18-021493_2018-08-13.htm", "0001166691_0000915389-17-000014_2017-02-27.htm", "0001053507_0001326380-15-000078_2015-03-30.htm", "0000095521_0000950129-07-001047_2007-02-28.htm", "0000785161_0001224608-16-000053_2016-02-19.htm", "0000819692_0000891618-04-000704_2004-03-12.htm", "0000006201_0001104659-05-011116_2005-03-15.htm", "0000860730_0001193125-05-223245_2005-11-10.htm", "0000020520_0001104659-06-053974_2006-08-11.htm", "0000915912_0001037038-15-000006_2015-05-15.htm", "0000006281_0000950133-08-000389_2008-02-07.htm", "0000063541_0001193125-09-257118_2009-12-21.htm", "0000860730_0001193125-11-271844_2011-10-14.htm", "0001400891_0001445305-11-002394_2011-08-05.htm", "0000314808_0000915912-18-000004_2018-02-23.htm", "0000040704_0000950109-03-001224_2003-03-07.htm", "0000092122_0000092122-11-000013_2011-02-25.htm", "0000028917_0000950137-06-004022_2006-03-31.htm", "0000026780_0001193125-06-027038_2006-02-10.htm", "0001598014_0001585689-14-000006_2014-02-27.htm", "0001385187_0001047469-08-001956_2008-02-29.htm", "0000812074_0000950152-08-001408_2008-02-26.htm", "0000851968_0001101215-19-000048_2019-02-26.htm", "0001310067_0001193125-10-055594_2010-03-12.htm", "0000818479_0001193125-12-195995_2012-04-30.htm", "0000883980_0000950152-08-004633_2008-06-16.htm", "0001115222_0001047469-14-001096_2014-02-20.htm", "0001364742_0001058090-16-000058_2016-02-05.htm", "0001007456_0000885639-13-000004_2013-03-22.htm", "0000006201_0000354964-13-000002_2013-03-04.htm", "0001274494_0001104659-11-010302_2011-02-25.htm", "0000018926_0001193125-11-028728_2011-02-09.htm", "0001168054_0001047469-03-011288_2003-03-31.htm", "0000935703_0001193125-14-045532_2014-02-11.htm", "0001310067_0001310067-15-000009_2015-03-17.htm", "0001122304_0001193125-13-070554_2013-02-22.htm", "0000714154_0001047469-07-002295_2007-03-29.htm", "0000029534_0001193125-16-467957_2016-02-18.htm", "0001571949_0000950134-09-004250_2009-03-02.htm", "0000046765_0000950133-09-000442_2009-02-26.htm", "0000875570_0001047469-05-004527_2005-02-24.htm", "0000816284_0000892569-08-000207_2008-02-29.htm", "0001430602_0001193125-11-320907_2011-11-23.htm", "0001156375_0001365135-18-000013_2018-02-22.htm", "0001037949_0000060667-06-000141_2006-09-29.htm", "0000352510_0001193125-12-081067_2012-02-27.htm", "0000080424_0000950152-05-007351_2005-08-29.htm", "0000108772_0000108772-18-000012_2018-02-23.htm", "0001274494_0001104659-04-007430_2004-03-15.htm", "0000043362_0000318154-17-000004_2017-02-14.htm", "0001166691_0000950123-11-019814_2011-02-28.htm", "0000091576_0000005513-18-000016_2018-02-21.htm", "0000916076_0001437107-14-000016_2014-02-20.htm", "0000896159_0001466258-19-000073_2019-02-12.htm", "0001571949_0000050104-17-000056_2017-02-21.htm", "0001275283_0001193125-06-036698_2006-02-22.htm", "0001466258_0000950123-11-016589_2011-02-22.htm", "0001087423_0000037785-17-000011_2017-02-28.htm", "0000006201_0001047469-09-005369_2009-05-11.htm", "0000053117_0001193125-09-066067_2009-03-27.htm", "0000792985_0001045810-18-000010_2018-02-28.htm", "0001370946_0001370946-17-000006_2017-02-08.htm", "0000936340_0000950124-05-001542_2005-03-15.htm", "0000721371_0000316709-16-000067_2016-02-24.htm", "0000107681_0001445305-14-000574_2014-02-25.htm", "0000850209_0001336917-18-000009_2018-02-28.htm", "0000764622_0000950135-07-007253_2007-11-30.htm", "0001681459_0001104659-07-015059_2007-02-28.htm", "0001039684_0001039684-12-000027_2012-02-21.htm", "0000934612_0000031235-11-000025_2011-02-25.htm", "0001168054_0001047469-09-002123_2009-03-02.htm", "0001378946_0000818479-09-000034_2009-02-20.htm", "0000029534_0001615774-19-002739_2019-02-20.htm", "0001020569_0000950152-09-002082_2009-03-02.htm", "0001593538_0001047469-13-003283_2013-03-25.htm", "0001339947_0000865436-14-000161_2014-11-21.htm", "0001115222_0001104659-12-013132_2012-02-27.htm", "0001652044_0000950134-03-009830_2003-07-03.htm", "0001659166_0000950123-11-014519_2011-02-16.htm", "0000812074_0001104659-06-017231_2006-03-16.htm", "0001393612_0001496048-17-000018_2017-02-22.htm", "0000711065_0001628280-18-002128_2018-02-23.htm", "0000820027_0001628280-18-000562_2018-01-25.htm", "0001613103_0000028917-18-000159_2018-03-30.htm", "0001037868_0000875320-19-000006_2019-02-13.htm", "0001101239_0001359841-17-000040_2017-02-03.htm", "0001017008_0000020520-15-000011_2015-02-25.htm", "0001702780_0001193125-11-040013_2011-02-18.htm"), name = c("KANSAS CITY SOUTHERN", "PUBLIC SERVICE ENTERPRISE GROUP INC", "SANDISK CORP", "PROGRESS ENERGY INC", "Ensco plc", "DOLLAR GENERAL CORP", "Hilton Worldwide Holdings Inc.", "DILLARD'S, INC.", "TOTAL SYSTEM SERVICES INC", "ABBOTT LABORATORIES", "Cooper Industries plc", "DUN & BRADSTREET CORP/NW", "FREESCALE SEMICONDUCTOR INC", "COMCAST CORP", "AMERICAN TOWER CORP /MA/", "SUPERVALU INC", "Encompass Health Corp", "CHARTER ONE FINANCIAL INC", "American Airlines Group Inc.", "HCA Healthcare, Inc.", "FRONTIER COMMUNICATIONS CORP", "AVALONBAY COMMUNITIES INC", "ANALOG DEVICES INC", "MAYTAG CORP", "HCA Healthcare, Inc.", "iHeartMedia, Inc.", "Ensco plc", "GENERAL MILLS INC", "SOUTHERN CO", "DILLARD'S, INC.", "DANA INC", "IHS Markit Ltd.", "Covidien plc", "OWENS ILLINOIS INC /DE/", "MOHAWK INDUSTRIES INC", "SEARS HOLDINGS CORP", "DENTSPLY SIRONA Inc.", "FIRST DATA CORP", "DUN & BRADSTREET CORP/NW", "BlackRock Inc.", "ELECTRONIC DATA SYSTEMS CORP /DE/", "American Airlines Group Inc.", "FIRST SOLAR, INC.", "CENTURYLINK, INC", "CIMAREX ENERGY CO", "DOLLAR TREE INC", "SEARS HOLDINGS CORP", "AETNA INC /PA/", "COMPAQ COMPUTER CORP", "DOLLAR GENERAL CORP", "Intercontinental Exchange, Inc.", "Helmerich & Payne, Inc.", "PEOPLESOFT INC", "CELGENE CORP /DE/", "Scripps Networks Interactive, Inc.", "CME GROUP INC.", "QWEST COMMUNICATIONS INTERNATIONAL INC", "NORTH FORK BANCORPORATION INC", "PROCTER & GAMBLE Co", "XEROX CORP", "FIRST SOLAR, INC.", "GREAT LAKES CHEMICAL CORP", "COMCAST CORP", "KEYCORP /NEW/", "MARTIN MARIETTA MATERIALS INC", "Chubb Ltd", "Intercontinental Exchange, Inc.", "REYNOLDS AMERICAN INC", "Ingersoll-Rand plc", "RED HAT INC", "American Airlines Group Inc.", "FORT JAMES CORP", "HEALTH MANAGEMENT ASSOCIATES, INC", "Owens Corning", "DTE ENERGY CO", "CARDINAL HEALTH INC", "WINN DIXIE STORES INC", "FOOT LOCKER, INC.", "PINNACLE WEST CAPITAL CORP", "TechnipFMC plc", "ONEOK INC /NEW/", "BURLINGTON NORTHERN SANTA FE, LLC", "CIMAREX ENERGY CO", "People's United Financial, Inc.", "DOLLAR GENERAL CORP", "IRON MOUNTAIN INC", "NAVIENT CORP", "Viacom Inc.", "DUN & BRADSTREET CORP/NW", "Alphabet Inc.", "Fortive Corp", "OWENS ILLINOIS INC /DE/", "Discover Financial Services", "APPLIED MICRO CIRCUITS CORP", "AMERIPRISE FINANCIAL INC", "Medtronic plc", "AMETEK INC/", "EQUINIX INC", "UNIVISION COMMUNICATIONS INC", "Altice USA, Inc."), filing_date_year = c(2019L, 2008L, 2015L, 2013L, 2010L, 2011L, 2005L, 2007L, 2004L, 2013L, 2006L, 2015L, 2018L, 2017L, 2015L, 2007L, 2016L, 2004L, 2005L, 2005L, 2006L, 2015L, 2008L, 2009L, 2011L, 2011L, 2018L, 2003L, 2011L, 2006L, 2006L, 2014L, 2008L, 2008L, 2019L, 2010L, 2012L, 2008L, 2014L, 2016L, 2013L, 2013L, 2011L, 2011L, 2003L, 2014L, 2015L, 2013L, 2007L, 2016L, 2009L, 2009L, 2005L, 2008L, 2011L, 2018L, 2006L, 2012L, 2005L, 2018L, 2004L, 2017L, 2011L, 2018L, 2014L, 2019L, 2017L, 2006L, 2011L, 2017L, 2009L, 2009L, 2018L, 2017L, 2005L, 2016L, 2014L, 2018L, 2007L, 2007L, 2012L, 2011L, 2009L, 2009L, 2019L, 2009L, 2013L, 2014L, 2012L, 2003L, 2011L, 2006L, 2017L, 2018L, 2018L, 2018L, 2019L, 2017L, 2015L, 2011L)), row.names = c(NA, -100L), class = "data.frame") 所示,那么它将开始下载数据。

d

这会将文件下载到单个文件夹中,但是我希望按年将文件放在多个文件夹中。

1 个答案:

答案 0 :(得分:1)

您可以尝试一下,首先创建所有年份目录,然后下载文件

def fun(...): return (...)
# such that
fun(a=1, b=2)
# returns
# {'a':1, 'b':2} 
# or something similar