使用for循环到" read_pickle"和" to_pickle"很多数据文件

时间:2016-03-06 17:42:15

标签: python pandas

我正在使用Linux和Ipython Notebook。我有一个包含date,bill_id和sponsor_id(每个赞助商超过一个账单)的pickle数据文件目录(' / home / jayaramdas / anaconda3 / pdf / senate_bills');我有一个pickle数据文件(位于:' / home / jayaramdas / anaconda3 / pdf / sbcommittee_id_pdf'),其中包含所有赞助商ID sbsponsor_id_pdf的列。我需要进入目录' / home /.../ senate_bills',打开每个pickle文件,创建一个单独的文件,收集sbsponsor_id_pdf文件中每个sponsor_id的所有bill_id,然后挑选文件,根据sponsor_id和两位数字给它命名。

到目前为止我的代码是:

import pandas as pd
import os
import os.path
path = '/home/jayaramdas/anaconda3/pdf/senate_bills'
path1 = '/home/jayaramdas/anaconda3/pdf'
dirs = os.listdir(path)
for dir in dirs:
with open(path + "/" + dir) as f:

    df = pd.read_pickle(f)
    with open(path + "/" + "/sbcommittee_id_pdf", "r") as f:
        data = json.load(f)

        for sponsor in data['sponsor_id']:

            pdf = df[df['sponsor_id'] == sponsor]

            pdf.to_pickle('sponsor' + '_08bills.pdf')

            print (pdf)

我收到以下错误:

TypeError   Traceback (most recent call     last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas /io/pickle.py in try_read(path, encoding)
 44         try:
---> 45             with open(path, 'rb') as fh:
 46                 return pkl.load(fh)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas    /anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas  /io/pickle.py     in try_read(path, encoding)
 50             try:
---> 51                 with open(path, 'rb') as fh:
 52                     return pc.load(fh, encoding=encoding, compat=False)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError      Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in read_pickle(path)
 59     try:
---> 60         return try_read(path)
 61     except:

/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
 55             except:
---> 56                 with open(path, 'rb') as fh:
 57        return pc.load(fh, encoding=encoding, compat=True)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
 44         try:
---> 45             with open(path, 'rb') as fh:
 46                 return pkl.load(fh)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
 50             try:
---> 51                 with open(path, 'rb') as fh:
 52                     return pc.load(fh, encoding=encoding, compat=False)

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-61-40e7738e1c05> in <module>()
  8     with open(path + "/" + dir) as f:
  9 
---> 10         df = pd.read_pickle(f)
 11         with open(path + "/" + "/sbcommittee_id_pdf", "r") as f:
 12             data = json.load(f)

/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in read_pickle(path)
 61     except:
 62         if PY3:
---> 63             return try_read(path, encoding='latin1')
 64         raise

/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
 54             # compat pickle
 55             except:
---> 56                 with open(path, 'rb') as fh:
 57                     return pc.load(fh, encoding=encoding, compat=True)
 58 

TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>

1 个答案:

答案 0 :(得分:1)

希望这会有所帮助。我不清楚JSON文件的文件位置以及它与路径的关系。

通常,您希望使用os.path.join(a, b),以便您的代码可以在多个平台上运行(例如Mac和PC)。

请注意,在for dir in dirs:之后,您的示例代码中缺少一层缩进(dir是一个保留字,无论如何都不应该使用)。

您还使用了f变量两次。试试f1f2或更具描述性的内容。

path = '/home/jayaramdas/anaconda3/pdf'
senate_bill_dir = os.path.join(path, 'senate_bills')

data = pd.read_pickle(os.path.join(path, 'sbcommittee_id_pdf.p'))
data.columns = ['sponsor_id']
for my_file in os.listdir(senate_bill_dir):  
    df = pd.read_pickle(os.path.join(senate_bill_dir, my_file))
    for sponsor in data['sponsor_id'].unique():
        pdf = df[df['sponsor_id'] == sponsor]
        if len(pdf):  # Only save if there are records.
            pdf.to_pickle(str(sponsor) + '_08bills.p')