在函数中调用两个pandas列?

时间:2017-08-15 07:39:32

标签: python python-3.x function pandas lambda

我有一堆我试图写入文件的URL。我将URL存储在pandas数据帧中。

数据框有两列:urlid。我正在尝试从url请求每个网址,并将其写入名为id的文件。

这是我到目前为止所得到的:

def get_link(url): 
    file_name = os.path.join('/mypath/foo/bar', df.id)
    try: 
        r = requests.get(url)
    except Exception as e:
        print("Failded to get " + url)
    else:
        with open(file_name, 'w') as f: 
            f.write(r.text)

df.url.apply(lambda l: get_link(l))

但是当我调用该函数时,它显然失败了,因为os.path.join期望string而不是series。因此我收到错误join() argument must be str or bytes, not 'Series'

有关我如何同时致电df.iddf.url的任何想法?

谢谢/ R

2 个答案:

答案 0 :(得分:1)

我认为您需要applyaxis=1按行进行处理,然后按x.urlx.id获取每行的值,因为与Series合作按列索引,此处为urlid

def get_link(x): 
    print (x) 
    file_name = os.path.join('/mypath/foo/bar', x.id)
    try: 
        r = requests.get(x.url)
    except Exception as e:
        print("Failded to get " + x.url)
    else:
        with open(file_name, 'w') as f: 
            f.write(r.text)

df.apply(get_link, axis=1)

<强>示例

df = pd.DataFrame({'url':['url1','url2'],
                   'id':[1,2]})

print (df)
   id   url
0   1  url1
1   2  url2

def get_link(x):
    print (x) 
    print ('url is: {}'.format(x.url))
    print ('id is: {}'.format(x.id))

df.apply(get_link, axis=1)

id        1
url    url1
Name: 0, dtype: object
url is: url1
id is: 1
id        2
url    url2
Name: 1, dtype: object
url is: url2
id is: 2

答案 1 :(得分:1)

除了id_之外,您还可以增强功能以​​获取url参数。

def get_link(url, id_): 
    file_name = os.path.join('/mypath/foo/bar', id_)
    try: 
        r = requests.get(url)
    except ConnectionError, MissingSchema as e:
        print("Failded to get " + url)
    else:
        with open(file_name, 'w') as f: 
            f.write(r.text)

然后只需遍历数据框即可调用您的函数。

for idx, row in df.iterrows():
    get_link(url=row.url, id_=row.id)