我正在迭代地读取日志文件并解析/提取数据,并希望将其附加到数据帧。
df = pd.DataFrame([], columns=['item','price','qty','sold'])
with open("mylogfile") as fh:
for line in fh:
data = extract_data(line)
df.append(data) ## ?
def extract_data(line):
# parse and get values as a list
return list_values
更新: 我收到以下错误: ValueError:传递值的形状是(0,0),索引暗示(4,0)
此外,我的日志文件包含格式为
的数据item,2,price,4.5,qty,17,sold,11
item,12,price,14.5,qty,7,sold,4
item,2,price,4.5,qty,13,sold,2
Edit2 :(实际文件是,我只对'item'行感兴趣
item,2,price,4.5,qty,17,sold,11
a,12,b,14,c,18,d,15,e16
item,12,price,14.5,qty,7,sold,4
x,4,y,1,z,81
a,12,b,14,c,18,d,15,e16
a,14,b,11,c,8,d,51,e26
item,2,price,4.5,qty,13,sold,2
x,14,y,11,z,8
答案 0 :(得分:2)
这是一个多步骤的方法:
In [210]:
# read in as csv, set header to None
df = pd.read_csv(io.StringIO(t), header=None)
df
Out[210]:
0 1 2 3 4 5 6 7
0 item 2 price 4.5 qty 17 sold 11
1 item 12 price 14.5 qty 7 sold 4
2 item 2 price 4.5 qty 13 sold 2
In [213]:
# extract the header names from the first row
col_names = df.iloc[0][0::2]
print(col_names)
# extract the data columns we will use later to filter the df
col_list = df.columns[1::2]
col_list
0 item
2 price
4 qty
6 sold
Name: 0, dtype: object
Out[213]:
Int64Index([1, 3, 5, 7], dtype='int64')
In [214]:
# now filter the df to the columns that actually have your data
df = df[col_list]
# assign the column names
df.columns = col_names
df
Out[214]:
0 item price qty sold
0 2 4.5 17 11
1 12 14.5 7 4
2 2 4.5 13 2
所以我会使用read_csv
将其作为csv阅读,不要逐字复制我的代码,将io.StringIO(t)
替换为文本文件的路径。
<强>更新强>
更好的方法是读取单行,提取感兴趣的标题名称和列,然后再次读取整个文件,但只选择那些感兴趣的列并传递列的名称:
In [217]:
df = pd.read_csv(io.StringIO(t), header=None, nrows=1)
df
Out[217]:
0 1 2 3 4 5 6 7
0 item 2 price 4.5 qty 17 sold 11
In [218]:
col_names = df.iloc[0][0::2]
print(col_names)
col_list = df.columns[1::2]
col_list
0 item
2 price
4 qty
6 sold
Name: 0, dtype: object
Out[218]:
Int64Index([1, 3, 5, 7], dtype='int64')
In [219]:
df = pd.read_csv(io.StringIO(t), usecols=col_list, names=col_names)
df
Out[219]:
item price qty sold
0 2 4.5 17 11
1 12 14.5 7 4
2 2 4.5 13 2