python循环通过.dat文件来构造非结构化数据

时间:2018-05-01 21:00:37

标签: python pandas

我正在尝试通过以下代码在使用结构化数据(例如分隔文本和/或csv)时正常工作的目录中读取.dat文件。我试图将3个dat文件读入pandas并将它们附加在一起并提取文件名以将其附加到数据帧的最后一列。

.dat的片段就在这里,这只是我用来显示数据的一个例子(前几行是我要跳过的垃圾:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
{"Basketball_Player_Name" : "Michael Jordan", "Bulls" : {"school" :[ { 'North Carolina' : {"id":1},
"sneaker_brand" : {'brandOfSneaker" : {'Nike': JustDoIt}, "Championships" :[[ 6]]}, "Statistics" :"bunchofStats", "Address" : "Chicago,IL",     "Export_Data" : "xls", "Version: 2}  and then more junk

我希望将以下列标题提取到pandas df中: Basketball_Player_Name 统计 地址 导出数据

然后存储与正确列关联的数据 迈克尔·乔丹, bunchOfStats, 芝加哥,伊利诺伊, XLS

感谢您的帮助,谢谢!

import os
import pandas as pd
import glob

### read through a directory
path =r'C:\Users\d\Desktop\data' # use your path
allFiles = glob.glob(path + "/*.dat")

#create the dataframe
frame = pd.DataFrame()

#append all the data to an empty list and read the files
list_total = []
 for file_ in allFiles:
  df = pd.read_csv(file_ , delim_whitespace = True)
  df['filename'] = os.path.basename(file_)
  list_total.append(df)


frame = pd.concat(list_total)

#Name the columns with a dictionary with the column info you want to store
frame.columns = ['names','stats','address','data_format']


#Print the frame
print(frame)

#saves data back to csv
frame.to_csv('C:/Users/d/Desktop/data/output.csv')

0 个答案:

没有答案