Python:循环遍历.csv的url并将其另存为另一列

时间:2017-08-30 02:49:13

标签: python python-2.7

蟒蛇新手,阅读一堆并观看了很多视频。我无法让它工作,我感到沮丧。

我有一个如下链接列表:

BEGIN_MESSAGE_MAP
    MESSAGE_HANDLER(CM_MOUSEENTER, TMessage, CMMouseEnter)
    MESSAGE_HANDLER(CM_MOUSELEAVE, TMessage, CMMouseLeave)
END_MESSAGE_MAP(TGraphicControl) // <-- fixed!

我正在尝试让python转到“URL”并将其保存在名为“location”的文件夹中作为文件名“API.las”。

ex)......“location”/ Section /“API”.las     C://.../T32S R29W / Sec.27 / 15-119-00164.las

该文件有数百行和要下载的链接。我也希望实现睡眠功能,以免轰炸服务器。

有哪些不同的方法可以做到这一点?我尝试过熊猫和其他一些方法......有什么想法吗?

3 个答案:

答案 0 :(得分:0)

方法1: -

您的文件假设有1000行。

创建具有以此形式存储的数据的主列表 - &gt;
[row1,row2,row3等]

完成后,循环浏览此主列表。在每次迭代中,您将以字符串格式获得一行。 拆分它创建一个列表并拼接url的最后一列,即row [-1]

并将其附加到名为result_url的空列表中。运行所有行后,将其保存在文件中,然后使用os模块轻松创建目录并将文件移到那里

方法2: -

如果文件太大,请在try块中逐行读取并处理您的数据(使用csv模块,您可以将每行作为列表,拼接URL并每次都将其写入文件API.las)。

一旦你的程序移动了第1001行,它将移动到除了你可以“通过”或写一个打印以获得通知的块之外。
在方法2中,您没有将所有数据保存在任何数据结构中,只是在执行时存储了一行,因此速度更快。

    import csv, os
    directory_creater = os.mkdir('Locations')       
    fme = open('./Locations/API.las','w+') 
    with open('data.csv','r') as csvfile:
        spamreader = csv.reader(csvfile, delimiter = ',')
        print spamreader.next()
        while True:
            try:
                row= spamreader.next()
                get_url = row[-1] 
                to_write = get_url+'\n' 
                fme.write(to_write)
            except:
                print "Program has run. Check output."
                exit(1)

此代码可以在更短的时间内完成您提到的所有内容。

答案 1 :(得分:0)

你必须做这样的事情

for link, file_name in zip(links, file_names):
    u = urllib.urlopen(link)
    udata = u.read()
    f = open(file_name+".las", "w")
    f.write(udata)
    f.close()
    u.close()

如果文件的内容不是您想要的,您可能需要查看像BeautifulSoup这样的抓取库进行解析。

答案 2 :(得分:0)

这可能有点脏,但它是解决问题的第一步。这完全取决于CSV中的每个值都包含在双引号中。如果不是这样,这个解决方案需要大量调整。

代码:

import os

csv = """
"KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
"1002880800","37.2354869","-100.4607509","T32S R29W, Sec. 27,   SW SW NE","Stanolind Oil and Gas Co.","William L. Rickers 1","15-119-00164","2705"," KB","2790","7652","http://www.kgs.ku.edu/WellLogs/32S29W/1043696830.zip"
"1002880821","37.1234622","-100.1158111","T34S R26W, Sec. 2,   NW NW NE","SKELLY OIL CO","GRACE MCKINNEY 'A' 1","15-119-00181","2290"," KB","4000","5900","http://www.kgs.ku.edu/WellLogs/34S26W/1043696831.zip"
""".strip() # trim excess space at top and bottom

root_dir = '/tmp/so_test'

lines = csv.split('\n') # break CSV on newlines
header = lines[0].strip('"').split('","') # grab first line and consider it the header

lines_d = [] # we're about to perform the core actions, and we're going to store it in this variable
for l in lines[1:]: # we want all lines except the top line, which is a header
    line_broken = l.strip('"').split('","') # strip off leading and trailing double-quote
    line_assoc = zip(header, line_broken) # creates a tuple of tuples out of the line with the header at matching position as key
    line_dict = dict(line_assoc) # turn this into a dict
    lines_d.append(line_dict)

    section_parts = [s.strip() for s in line_dict['Location'].split(',')] # break Section value to get pieces we need

    file_out = os.path.join(root_dir, '%s%s%s%sAPI.las'%(section_parts[0], os.path.sep, section_parts[1], os.path.sep)) # format output filename the way I think is requested

    # stuff to show what's actually put in the files
    print file_out, ':'
    print '    ', '"%s"'%('","'.join(header),)
    print '    ', '"%s"'%('","'.join(line_dict[h] for h in header))

输出:

 ~/so_test $ python so_test.py 
/tmp/so_test/T32S R29W/Sec. 27/API.las :
     "KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
     "1002880800","37.2354869","-100.4607509","T32S R29W, Sec. 27,   SW SW NE","Stanolind Oil and Gas Co.","William L. Rickers 1","15-119-00164","2705"," KB","2790","7652","http://www.kgs.ku.edu/WellLogs/32S29W/1043696830.zip"
/tmp/so_test/T34S R26W/Sec. 2/API.las :
     "KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
     "1002880821","37.1234622","-100.1158111","T34S R26W, Sec. 2,   NW NW NE","SKELLY OIL CO","GRACE MCKINNEY 'A' 1","15-119-00181","2290"," KB","4000","5900","http://www.kgs.ku.edu/WellLogs/34S26W/1043696831.zip"
 ~/so_test $