如何使用python读取包含不同行中的多个列和数据帧的文件?

时间:2019-09-09 07:31:52

标签: python-3.x pandas

你好,我必须读取一个不是csv或excel的文件,因此很难将其读取为带有熊猫的数据框。但是我必须阅读和操作它。到目前为止,Pandas可以为我提供有关读取文件,绘制图形,导入/导出其他文件或数据库等方面的帮助。现在,我希望它再次帮助我阅读this文件。

我意识到,我必须为SELECT b.ball_id, b.color, sum(CASE WHEN b.color='Red' then 1 END ) from ball_table b group by 1,2 的列留空白, widths的特定行。

有一个小脚本可以读取我的文件,但似乎很难处理。

skiprows

我制作了一个字典,将数据标题作为键,它是开始/停止行,列之间为空白。

这些文件的所有键都作为文件中的标题,但该标题下可能没有数据。如何使用pandas path = "ornek.inp" with open(path) as f: content = f.readlines() content = [x.strip() for x in content] inp_file = {"[JUNCTIONS]": {"start": content.index("[JUNCTIONS]"), "stop": content.index("[RESERVOIRS]") - 1, "widths": [18,12,10,13]}, "[RESERVOIRS]": {"start": content.index("[RESERVOIRS]"), "stop": content.index("[TANKS]") - 1, "widths": [18,12,13]}, "[TANKS]": {"start": content.index("[TANKS]"), "stop": content.index("[PIPES]") - 1, "widths": [18,9,13,13,13,12,13,13]}, "[PIPES]": {"start": content.index("[PIPES]"), "stop": content.index("[PUMPS]") - 1, "widths": [18,9,15,16,15,14,13,13]}, "[PUMPS]": {"start": content.index("[PUMPS]"), "stop": content.index("[VALVES]") - 1, "widths": [12,13,15,22]}, "[VALVES]": {"start": content.index("[VALVES]"), "stop": content.index("[TAGS]") - 1, "widths": [18,15,15,12,10,13,9]}, "[TAGS]": {"start": content.index("[TAGS]")-1, "stop": content.index("[DEMANDS]") - 1, "widths": [12,12,12]}, "[DEMANDS]": {"start": content.index("[DEMANDS]"), "stop": content.index("[STATUS]") - 1, "widths": [12,12,14,18]}, "[STATUS]": {"start": content.index("[STATUS]"), "stop": content.index("[PATTERNS]") - 1, "widths": [12,12]}, "[PATTERNS]": {"start": content.index("[PATTERNS]"), "stop": content.index("[CURVES]") - 1, "widths": [12,17]}, "[CURVES]": {"start": content.index("[CURVES]"), "stop": content.index("[CONTROLS]") - 1, "widths": [12,13,15]}, "[CONTROLS]": {"start": content.index("[CONTROLS]"), "stop": content.index("[RULES]") - 1, "widths": []}, "[RULES]": {"start": content.index("[RULES]"), "stop": content.index("[ENERGY]") - 1, "widths": [12,12,12]}, "[ENERGY]": {"start": content.index("[ENERGY]")-1, "stop": content.index("[EMITTERS]") - 1, "widths": [12,12,12]}, "[EMITTERS]": {"start": content.index("[EMITTERS]"), "stop": content.index("[QUALITY]") - 1, "widths": [12,12,12]}, "[QUALITY]": {"start": content.index("[QUALITY]"), "stop": content.index("[SOURCES]") - 1, "widths": [12,12,12]}, "[SOURCES]": {"start": content.index("[SOURCES]"), "stop": content.index("[REACTIONS]") - 1, "widths": [12,12,12]}, "[REACTIONS]": {"start": content.index("[REACTIONS]"), "stop": content.index("[TANKS]") - 1, "widths": [12,12,12]}, "[MIXING]": {"start": content.index("[MIXING]"), "stop": content.index("[TIMES]") - 1, "widths": [12,12,12]}, "[TIMES]": {"start": content.index("[TIMES]"), "stop": content.index("[REPORT]") - 1, "widths": [12,12,12]}, "[REPORT]": {"start": content.index("[REPORT]"), "stop": content.index("[OPTIONS]") - 1, "widths": [12,12,12]}, "[OPTIONS]": {"start": content.index("[OPTIONS]"), "stop": content.index("[COORDINATES]") - 1, "widths": [12,12,12]}, "[COORDINATES]": {"start": content.index("[COORDINATES]"), "stop": content.index("[VERTICES]") - 1, "widths": [12,12,12]}, "[VERTICES]": {"start": content.index("[VERTICES]"), "stop": content.index("[LABELS]") - 1, "widths": [12,12,12]}, "[LABELS]": {"start": content.index("[LABELS]"), "stop": content.index("[BACKDROP]") - 1, "widths": [12,12,12]}, "[BACKDROP]": {"start": content.index("[BACKDROP]"), "stop": content.index("[END]") - 1, "widths": [12,12,12]}} def logic(index, start, stop): if start < index < stop: return False return True fwidths = inp_file["[ENERGY]"]["widths"] start = inp_file["[ENERGY]"]["start"] stop = inp_file["[ENERGY]"]["stop"] df = pd.read_fwf(path, widths=fwidths, skiprows=lambda x: logic(x, start, stop)) 或其他pythonic解决方案来解决此问题?

0 个答案:

没有答案