我正在使用Linux和Ipython Notebook。我有一个包含date,bill_id和sponsor_id(每个赞助商超过一个账单)的pickle数据文件目录(' / home / jayaramdas / anaconda3 / pdf / senate_bills');我有一个pickle数据文件(位于:' / home / jayaramdas / anaconda3 / pdf / sbcommittee_id_pdf'),其中包含所有赞助商ID sbsponsor_id_pdf的列。我需要进入目录' / home /.../ senate_bills',打开每个pickle文件,创建一个单独的文件,收集sbsponsor_id_pdf文件中每个sponsor_id的所有bill_id,然后挑选文件,根据sponsor_id和两位数字给它命名。
到目前为止我的代码是:
import pandas as pd
import os
import os.path
path = '/home/jayaramdas/anaconda3/pdf/senate_bills'
path1 = '/home/jayaramdas/anaconda3/pdf'
dirs = os.listdir(path)
for dir in dirs:
with open(path + "/" + dir) as f:
df = pd.read_pickle(f)
with open(path + "/" + "/sbcommittee_id_pdf", "r") as f:
data = json.load(f)
for sponsor in data['sponsor_id']:
pdf = df[df['sponsor_id'] == sponsor]
pdf.to_pickle('sponsor' + '_08bills.pdf')
print (pdf)
我收到以下错误:
TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas /io/pickle.py in try_read(path, encoding)
44 try:
---> 45 with open(path, 'rb') as fh:
46 return pkl.load(fh)
TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas /anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas /io/pickle.py in try_read(path, encoding)
50 try:
---> 51 with open(path, 'rb') as fh:
52 return pc.load(fh, encoding=encoding, compat=False)
TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in read_pickle(path)
59 try:
---> 60 return try_read(path)
61 except:
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
55 except:
---> 56 with open(path, 'rb') as fh:
57 return pc.load(fh, encoding=encoding, compat=True)
TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
44 try:
---> 45 with open(path, 'rb') as fh:
46 return pkl.load(fh)
TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
50 try:
---> 51 with open(path, 'rb') as fh:
52 return pc.load(fh, encoding=encoding, compat=False)
TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-61-40e7738e1c05> in <module>()
8 with open(path + "/" + dir) as f:
9
---> 10 df = pd.read_pickle(f)
11 with open(path + "/" + "/sbcommittee_id_pdf", "r") as f:
12 data = json.load(f)
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in read_pickle(path)
61 except:
62 if PY3:
---> 63 return try_read(path, encoding='latin1')
64 raise
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/io/pickle.py in try_read(path, encoding)
54 # compat pickle
55 except:
---> 56 with open(path, 'rb') as fh:
57 return pc.load(fh, encoding=encoding, compat=True)
58
TypeError: invalid file: <_io.TextIOWrapper name='/home/jayaramdas/anaconda3/pdf/senate_bills/s113_sb_pdf' mode='r' encoding='UTF-8'>
答案 0 :(得分:1)
希望这会有所帮助。我不清楚JSON文件的文件位置以及它与路径的关系。
通常,您希望使用os.path.join(a, b)
,以便您的代码可以在多个平台上运行(例如Mac和PC)。
请注意,在for dir in dirs:
之后,您的示例代码中缺少一层缩进(dir是一个保留字,无论如何都不应该使用)。
您还使用了f
变量两次。试试f1
和f2
或更具描述性的内容。
path = '/home/jayaramdas/anaconda3/pdf'
senate_bill_dir = os.path.join(path, 'senate_bills')
data = pd.read_pickle(os.path.join(path, 'sbcommittee_id_pdf.p'))
data.columns = ['sponsor_id']
for my_file in os.listdir(senate_bill_dir):
df = pd.read_pickle(os.path.join(senate_bill_dir, my_file))
for sponsor in data['sponsor_id'].unique():
pdf = df[df['sponsor_id'] == sponsor]
if len(pdf): # Only save if there are records.
pdf.to_pickle(str(sponsor) + '_08bills.p')