Question

以下是数据文件的示例：

 =====
 name          aaa
 place         paaa
 date          Thu Oct 1 12:02:03 2015
 load_status   198
 add_name      naaa
 [---blank line---]
 =====
 name          bbb
 place         pbbb
 date          Thu Oct 3 21:20:36 2015
 load_status   2000.327
 add_name      nbbb
 [---blank line---]

在一个文件中可能有数百条记录。

我想得到一个像这样的pandas对象：

   name | place | date                    | load_status | add_name
   ---------------------------------------------------------------
   aaa  | paaa  | Thu Oct 1 12:02:03 2015 | 198         | naaa
   bbb  | pbbb  | Thu Oct 3 21:20:36 2015 | 2000.327    | nbbb

每条记录中的字段数相同：所以所有记录都有一些“名称”，“地点”等。

我可以使用“bash + grep + awk”转置文件，然后将其作为csv读取，但对于只有Python和Windows的用户来说这是不实际的。使用Python转置文件然后将其作为csv读取看起来有点过分，因为我希望Pandas应该能够处理这种情况。

我想到了Series + dtypes和read_table - 但是无法让它们为我工作。

Answer 1

这是Python中的一个简单循环。之后你必须做一些清洁工作，然后进行一些检查，但这应该可以让你开始。

import pandas as pd

records = []
this_record = {}
with open(input_fn, 'r') as f:
    for line in f:
        if line.strip() == '':
            records.append(this_record)
            this_record = {}
            continue
        elif line.startswith('='):
            continue
        line = line.split()
        this_record[line[0]] = ' '.join(line[1:]).strip()

df = pd.DataFrame.from_records(records)

将类似系列的数据文件导入到pandas中

1 个答案: