检查数据帧列以查看bool是否为True / False,如果为False,则仅对这些值进行地理编码

时间:2018-06-12 19:31:32

标签: python pandas boolean geocoder

我正在使用[geocoder python API库] [1]。我有一个布尔值为True / False的pandas数据帧列,基于我是否已经对该特定地址进行了地理编码。有没有办法根据我是否对其进行地理编码来修改现有代码以进行地理编码?

现在它所做的只是打印一个True语句,然后对所有内容进行地理编码,无论我有什么布尔值。请帮忙!

这是另一种说法:

我有推文的数据框。如果对Tweet进行了地理编码,我已将Tweet标记为True(如果已进行地理编码)或False(如果尚未进行地理编码)。我要做的是检查列是否为True,打印出该行。否则,如果该行为False,则将其发送到我的for循环以进行地理编码。我将编辑原始帖子以进行输入。

这是我现有的代码:

for d in tweets2['Exist']:
    if d is True:
        print d
    elif d.any() is False:
        coord = []
        for index, row in tweets2.iterrows():
            print(row['location_x'])
            time.sleep(1.01)
            g = geocoder.osm(row['location_x'])
            geo = g.latlng
            print(geo)
            coord.append(geo)
    else:
        pass 

以下是JSON文件作为输入的示例:

{
"data": [
    {
        "user_id": 3299796214, 
        "features": {
            "screen_name": "SaveOurSparrows", 
            "text": "Details confirmed for inquiry into #INEOS #Derbyshire #Fracking site! \n\nAnti Fracking, #keepitintheground #wesaidno\u2026", 
            "location": "West Pennine Moors AONB SSSI", 
            "tweets": 3, 
            "geo_type": "User location", 
            "primary_geo": "West Pennine Moors AONB SSSI", 
            "id": 3299796214, 
            "name": "SaveOurSparrows",
            "Exist": "True"
        }
    }, 
    {
        "user_id": 3302831409, 
        "features": {
            "screen_name": "ProjectLower", 
            "text": "Cutting down on energy costs is the dream for many #smallbusinesses, but to put ideas into practice isn\u2019t always ea\u2026", 
            "location": "Manchester", 
            "tweets": 1, 
            "geo_type": "User location", 
            "primary_geo": "Manchester", 
            "id": 3302831409, 
            "name": "Project Lower",
            "Exist": "False"
        }
    }, 
    {
        "user_id": 2205129714, 
        "features": {
            "screen_name": "AmbCanHaiti", 
            "text": "Petit-d\u00e9jeuner causerie le mercredi 28 mars 2018 \u00e0 l'h\u00f4tel Montana sur l'\u00e9nergie #micror\u00e9seaux #microgrids\u2026", 
            "location": "Haiti", 
            "tweets": 1, 
            "geo_type": "User location", 
            "primary_geo": "Haiti", 
            "id": 2205129714, 
            "name": "Canada en Ha\u00efti",
            "Exist": "False"
        }
    }
 ]

}

1 个答案:

答案 0 :(得分:0)

最简单的方法是遍历您的数据集,如果没有coords密钥,请添加它:

for data in your_data_set['data']:
    data['coords'] = data.setdefault('coords',  geocoder.osm(data'location_x']).latlang)

然后,将其转换为数据帧。

如果您已将其作为数据框:

df.loc[df['coords'] == False, 'coords'] = geocoder.osm(df['location_x']).latlang