遍历数据并循环

时间:2019-02-19 15:49:23

标签: python-3.x pandas

我正在尝试创建一个对数据集进行迭代并使用数据集中的术语来查询api的函数。我已将问题隔离到此功能。我需要它至少使用两次提供的邮政编码来调用api,然后再移至同一区域内的下一个邮政编码。一旦在同一区域内提取了至少20个邮政编码的数据,我就需要将其移至下一个区域并重新开始该过程。但是,我无法弄清楚如何将该逻辑转换为python。您能提供的任何帮助将不胜感激。

def get_zip(data):
    df = pd.read_csv(data, converters={'zip': lambda x: '{0:0>5}'.format(x)})
    dfs = pd.DataFrame(df[['zip', 'region']])
    regions = dfs['region'].unique().tolist()

    i = 1
    while i < len(regions):
        # print(regions[i])
        zlen = len(df.zip[df.region == '%s' % regions[i]])
        print(zlen)
        print(i)
        if i in range(min(zlen, 20)):
            zipcode = df.zip[df.region == '%s' % regions[i]].iloc[i]
            i += 1
            return zipcode
        else:
            zipcode = df.zip[df.region == '%s' % regions[i]].iloc[i]
            return zipcode

get_zip(metro_data.csv)

metro_data.csv结构如下:

      zip                                  region
0    29831  Augusta-Richmond County, GA-SC Metro Area
1    29129  Augusta-Richmond County, GA-SC Metro Area
2    30808  Augusta-Richmond County, GA-SC Metro Area
3    29809  Augusta-Richmond County, GA-SC Metro Area
4    29137  Augusta-Richmond County, GA-SC Metro Area
5    29851  Augusta-Richmond County, GA-SC Metro Area
6    30816  Augusta-Richmond County, GA-SC Metro Area
7    30805  Augusta-Richmond County, GA-SC Metro Area
8    29105  Augusta-Richmond County, GA-SC Metro Area
9    30426  Augusta-Richmond County, GA-SC Metro Area
10   29856  Augusta-Richmond County, GA-SC Metro Area
11   29834  Augusta-Richmond County, GA-SC Metro Area
12   29828  Augusta-Richmond County, GA-SC Metro Area
13   30812  Augusta-Richmond County, GA-SC Metro Area
800  31721                      Albany, GA Metro Area
801  39842                      Albany, GA Metro Area
802  31763                      Albany, GA Metro Area
803  31791                      Albany, GA Metro Area
804  39870                      Albany, GA Metro Area
805  31787                      Albany, GA Metro Area
806  31781                      Albany, GA Metro Area
813  27801                 Rocky Mount, NC Metro Area
814  27804                 Rocky Mount, NC Metro Area
815  27886                 Rocky Mount, NC Metro Area
816  27803                 Rocky Mount, NC Metro Area
817  27856                 Rocky Mount, NC Metro Area
818  27891                 Rocky Mount, NC Metro Area
819  27882                 Rocky Mount, NC Metro Area
820  27809                 Rocky Mount, NC Metro Area
821  27864                 Rocky Mount, NC Metro Area
822  27557                 Rocky Mount, NC Metro Area

2 个答案:

答案 0 :(得分:0)

将所有回报更改为收益

if i in range(min(zlen, 20)):
    zipcode = df.zip[df.cbsa_name == '%s' % regions[i]].iloc[i]
    i += 1
    yield zipcode
else:
    zipcode = df.zip[df.cbsa_name == '%s' % regions[i]].iloc[i]
    yield zipcode

使用:

data=get_zip("metro_data.csv")
firstiter=data.__next__()
seconditer=data.__next__()

答案 1 :(得分:0)

您写了您想要的邮政编码,该邮政编码显示在同一区域 至少两次。此(更精确-带有这些邮政编码的df行) 可以得到如下:

df2 = df.groupby(['region', 'zip']).filter(lambda gr: len(gr) > 1)

出于演示目的,我更改了您的源数据,并重复了一些 zip,因此结果(针对我的测试数据)为:

       zip                                     region
ind                                                  
0    29831  Augusta-Richmond County, GA-SC Metro Area
1    29831  Augusta-Richmond County, GA-SC Metro Area
2    30808  Augusta-Richmond County, GA-SC Metro Area
3    30808  Augusta-Richmond County, GA-SC Metro Area
4    29137  Augusta-Richmond County, GA-SC Metro Area
5    29137  Augusta-Richmond County, GA-SC Metro Area
7    30805  Augusta-Richmond County, GA-SC Metro Area
8    30805  Augusta-Richmond County, GA-SC Metro Area
802  31763                      Albany, GA Metro Area
803  31763                      Albany, GA Metro Area
805  31787                      Albany, GA Metro Area
806  31787                      Albany, GA Metro Area
814  27804                 Rocky Mount, NC Metro Area
815  27804                 Rocky Mount, NC Metro Area

如您所见,有:

    Augusta-Richmond 地区的
  • 4个拉链,
  • Albany 地区的
  • 2个拉链,
  • 1个位于落基山区域的拉链。

其他邮政编码不重复。

出于演示目的,我降低了 每个区域减少到3个(而不是您的20个)。

然后,要获取zip的“受限”列表,您可以编写:

df2.groupby(['region']).apply(
    lambda x: pd.Series(x.zip.unique()[:3]))\
    .reset_index(level=1, drop=True).rename('zip')

获取:

region
Albany, GA Metro Area                        31763
Albany, GA Metro Area                        31787
Augusta-Richmond County, GA-SC Metro Area    29831
Augusta-Richmond County, GA-SC Metro Area    30808
Augusta-Richmond County, GA-SC Metro Area    29137
Rocky Mount, NC Metro Area                   27804
Name: zip, dtype: object

如您所见, Augusta-Richmond 地区的邮政编码数量 已降低至3。

因此,现在您有了zip列表,可以随心所欲地处理, 例如调用一些API。