如何使用df.loc和键列遍历数据框

时间:2019-10-22 19:55:19

标签: python python-3.x pandas

我有一个daraframe,可为每个OfficeLocation

返回数据

enter image description here

我该如何按每个OfficeLocation拆分数据帧,并将每条数据插入单独的Excel电子表格中。

import pandas
import pyodbc

server = 'MyServer'
db = 'MyDB'

myparams = ['2019-01-01','2019-02-28', None]  # None substitutes NULL in sql
connection_string = pyodbc.connect('DRIVER={SQL Server};server='+server+';DATABASE='+ db+';Trusted_Connection=yes;')
df = pandas.read_sql_query('EXEC PythonTest_Align_RSrptAccountCurrentMunich @EffectiveDateFrom=?,@EffectiveDateTo=?,@ProducerLocationID=?', connection_string, params = myparams)

# sort the daraframe
df.sort_values(by=['OfficeLocation'], axis=0,inplace=True)

# set the index to be this and do not drop 
df.set_index(keys=['OfficeLocation'],drop=False,inplace=True)

# get a list of unique offices
office = df['OfficeLocation'].unique().tolist()

# now we can perform a lookup on a 'view' of the dataframe
SanDiego = df.loc['San Diego']
print(SanDiego)

# how can I iterate through each office and create excel file for each office
df.loc['San Diego'].to_excel((r'\\user\name\Python\SanDIego_Office.xlsx'))

所以我需要3个具有数据的excel电子表格:SanDiego.xlsx, Vista.xlsxSanBernardino.xlsx

2 个答案:

答案 0 :(得分:3)

您可以使用for the_key, the_value in Journal.items(): if the_value >= 51: # you can reference this directly passed_ones[the_key] = the_value # reference key directly as well to insert it into the new dictionary equal to the value else: not_passed_ones[the_key] = the_value # same as before

//the line below doesn't work
if(comands[i].equals("\"" + var[position].getNome() + "\"")){
    System.out.print(var[position].getName() + " ");
}

答案 1 :(得分:1)

像这样简单的事情怎么样?

+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| Policy Number | ProducerLocationId | OfficeLOcation | EffectiveDate | ExpirationDate | TransactionType | BondAmount | GrossPremium |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| 7563299       | 8160               | Aldora         | 31/10/2018    | 28/01/2019     | Cancelled       | -61081     | -2372.303665 |
| 6754151       | 3122               | Aucilla        | 04/05/2019    | 15/06/2019     | New Business    | -80151     | -4135.443318 |
| 3121128       | 3230               | Aulander       | 11/10/2018    | 29/12/2018     | New Business    | -67563     | -28394.83428 |
| 911463        | 4041               | Aullville      | 30/11/2018    | 20/02/2019     | New Business    | -47918     | -17840.05749 |
| 5068380       | 3794               | Ava            | 10/01/2019    | 28/03/2019     | Cancelled       | -41094     | -30523.0655  |
| 2174424       | 1263               | Alcan Border   | 18/04/2019    | 10/07/2019     | Cancelled       | -73661     | -5979.278874 |
| 475464        | 9250               | Audubon        | 15/01/2019    | 17/02/2019     | New Business    | -85217     | -64988.83987 |
| 2076075       | 7405               | Alderton       | 20/08/2019    | 26/09/2019     | New Business    | -32335     | -11144.63342 |
| 3645387       | 9357               | Austwell       | 22/10/2018    | 19/12/2018     | Cancelled       | -5065      | -5013.982643 |
| 3316361       | 1335               | Aurora         | 29/09/2018    | 24/12/2018     | New Business    | -13939     | -6333.580641 |
| 1404387       | 2656               | Auburn Hills   | 04/07/2019    | 19/09/2019     | Cancelled       | -12049     | -385.3522259 |
| 6908433       | 1288               | Alcester       | 30/10/2018    | 18/01/2019     | Cancelled       | -56902     | -27341.06181 |
| 9908879       | 6012               | Alexandria     | 20/06/2019    | 21/08/2019     | Cancelled       | -76226     | -12671.06376 |
| 7850879       | 4606               | Avery          | 10/11/2018    | 21/01/2019     | Cancelled       | -54297     | -40619.42718 |
| 8437707       | 4149               | Auxvasse       | 22/09/2019    | 28/10/2019     | Cancelled       | -59584     | -19800.71077 |
| 4260681       | 1889               | Auburndale     | 06/07/2019    | 22/08/2019     | New Business    | -55035     | -18271.5442  |
| 7234116       | 2636               | Alexander      | 14/07/2019    | 31/08/2019     | New Business    | -59319     | -15711.2827  |
| 3721467       | 3765               | Alexander City | 16/10/2018    | 23/12/2018     | Cancelled       | -98431     | -26743.07459 |
| 6859964       | 7035               | Alburtis       | 04/11/2018    | 26/12/2018     | New Business    | -36917     | -11339.9049  |
| 2994719       | 6997               | Aleneva        | 09/02/2019    | 13/04/2019     | New Business    | -55739     | -46323.01608 |
| 7542794       | 8968               | Aullville      | 25/09/2018    | 09/11/2018     | Cancelled       | -44488     | -4554.278674 |
| 1340649       | 7003               | Augusta        | 30/11/2018    | 17/02/2019     | New Business    | -78405     | -71910.93325 |
| 8078558       | 7185               | Alderpoint     | 10/06/2019    | 22/07/2019     | New Business    | -37928     | -29289.29545 |
| 8198811       | 8963               | Alden          | 05/07/2019    | 15/08/2019     | Cancelled       | -97648     | -79946.41222 |
| 2510522       | 5714               | Avella         | 03/09/2019    | 02/11/2019     | New Business    | -16452     | -11230.93829 |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+

编辑

我已经生成了50,000行与您相似的数据。

def loop_save_unique(df):    
    for loc in df["OfficeLOcation"].unique():
        save_df = df[df["OfficeLOcation"] == loc]
        save_df.to_excel("output\\test1\\" + loc + ".xlsx")
​
def loop_save_groupby(df):
    for location, d in df.groupby('OfficeLOcation'):
        d.to_excel(f'output\\test2\\{location}.xlsx')



%timeit loop_save_unique(df)
12.1 s ± 556 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit loop_save_groupby(df)
11.1 s ± 183 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

并创建了两个函数,一个使用我的版本,另一个使用groupby方法。

万一有人想知道它们的性能都相似,但是groupby方法以较小的方差和更快的1秒的运行时间排在首位。

{{1}}