如何在pandas.DataFrame()中将列表作为一行追加?

时间:2015-01-29 09:54:30

标签: python numpy pandas data-analysis

我正在迭代地读取日志文件并解析/提取数据,并希望将其附加到数据帧。

df = pd.DataFrame([], columns=['item','price','qty','sold'])
with open("mylogfile") as fh:
    for line in fh:
        data = extract_data(line)
        df.append(data) ## ?


def extract_data(line):
   # parse and get values as a list
   return list_values

更新: 我收到以下错误: ValueError:传递值的形状是(0,0),索引暗示(4,0)

此外,我的日志文件包含格式为

的数据
item,2,price,4.5,qty,17,sold,11
item,12,price,14.5,qty,7,sold,4
item,2,price,4.5,qty,13,sold,2

Edit2 :(实际文件是,我只对'item'行感兴趣

item,2,price,4.5,qty,17,sold,11
a,12,b,14,c,18,d,15,e16
item,12,price,14.5,qty,7,sold,4
x,4,y,1,z,81
a,12,b,14,c,18,d,15,e16
a,14,b,11,c,8,d,51,e26
item,2,price,4.5,qty,13,sold,2
x,14,y,11,z,8

1 个答案:

答案 0 :(得分:2)

这是一个多步骤的方法:

In [210]:
# read in as csv, set header to None
df = pd.read_csv(io.StringIO(t), header=None)
df

Out[210]:
      0   1      2     3    4   5     6   7
0  item   2  price   4.5  qty  17  sold  11
1  item  12  price  14.5  qty   7  sold   4
2  item   2  price   4.5  qty  13  sold   2

In [213]:
# extract the header names from the first row
col_names = df.iloc[0][0::2]
print(col_names)
# extract the data columns we will use later to filter the df
col_list = df.columns[1::2]
col_list
0     item
2    price
4      qty
6     sold
Name: 0, dtype: object

Out[213]:
Int64Index([1, 3, 5, 7], dtype='int64')

In [214]:
# now filter the df to the columns that actually have your data
df = df[col_list]
# assign the column names
df.columns = col_names
df

Out[214]:
0  item  price  qty  sold
0     2    4.5   17    11
1    12   14.5    7     4
2     2    4.5   13     2

所以我会使用read_csv将其作为csv阅读,不要逐字复制我的代码,将io.StringIO(t)替换为文本文件的路径。

<强>更新

更好的方法是读取单行,提取感兴趣的标题名称和列,然后再次读取整个文件,但只选择那些感兴趣的列并传递列的名称:

In [217]:

df = pd.read_csv(io.StringIO(t), header=None, nrows=1)
df
Out[217]:
      0  1      2    3    4   5     6   7
0  item  2  price  4.5  qty  17  sold  11
In [218]:

col_names = df.iloc[0][0::2]
print(col_names)
col_list = df.columns[1::2]
col_list
0     item
2    price
4      qty
6     sold
Name: 0, dtype: object
Out[218]:
Int64Index([1, 3, 5, 7], dtype='int64')
In [219]:

df = pd.read_csv(io.StringIO(t), usecols=col_list, names=col_names)
df
Out[219]:
   item  price  qty  sold
0     2    4.5   17    11
1    12   14.5    7     4
2     2    4.5   13     2