Question

我一直在努力通过使用他们的user_ids来提取flickr用户位置（不是纬度和长度，而是人的国家）。我创建了一个数据框(Here's the dataframe)，其中包含照片ID，所有者和其他一些列。我的尝试是通过迭代数据框中的所有者列来为每个所有者提供flickr.people.getInfo()查询。这是我的尝试

for index, row in df.iterrows():
     A=np.array(df["owner"])
for i in range(len(A)):
    B=flickr.people.getInfo(user_id=A[i])

不幸的是，结果只有1个结果。仔细检查后，我发现它属于数据帧中的最后一个用户。我的数据框有250个观察结果。我不知道如何提取其他人。任何帮助表示赞赏。

Answer 1

您似乎忘记了在迭代数据帧时存储结果。我没有使用API，但我认为这个代码片段应该这样做。

result_list = []
for idx, owner in df['owner'].iteritems():
    result_list.appen(pd.read_json(json.dumps(flickr.people.get‌Info(user_id=owner))‌,orient=list))
    # you may have to set the orient parameter. 
    # Option are: 'split','records','index', Default is 'index'

结果存储在dictonary中，其中用户ID是密钥。

编辑：

由于它是JSON，您可以使用PodSpec函数来解析结果。示例：

df = pd.concat(result_list, axis=1).transpose()

注意：我将dictonary切换到列表，因为它更方便

之后，您可以将生成的pandas系列连接在一起，如下所示：

transpose()

我添加了s3，因为您可能希望将ID作为索引。之后，您应该能够按列“位置”进行排序。希望有所帮助。

Answer 2

实现这一目标的规范方法是使用apply。它会更有效率。

import pandas as pd
import numpy as np

np.random.seed(0)

# A function to simulate the call to the API
def get_user_info(id):
    return np.random.randint(id, id + 10)

# Some test data
df = pd.DataFrame({'id': [0,1,2], 'name': ['Pierre', 'Paul', 'Jacques']})

# Here the call is made for each ID
df['info'] = df['id'].apply(get_user_info)

#    id     name  info
# 0   0   Pierre     5
# 1   1     Paul     1
# 2   2  Jacques     5

注意，编写相同内容的另一种方法是

df['info'] = df['id'].map(lambda x: get_user_info(x))

如何将user_ids数组提供给flickr.people.getInfo（）？

2 个答案: