你好,我必须读取一个不是csv或excel的文件,因此很难将其读取为带有熊猫的数据框。但是我必须阅读和操作它。到目前为止,Pandas可以为我提供有关读取文件,绘制图形,导入/导出其他文件或数据库等方面的帮助。现在,我希望它再次帮助我阅读this文件。
我意识到,我必须为SELECT b.ball_id,
b.color,
sum(CASE WHEN b.color='Red' then 1 END )
from ball_table b
group by 1,2
的列留空白,
widths
的特定行。
有一个小脚本可以读取我的文件,但似乎很难处理。
skiprows
我制作了一个字典,将数据标题作为键,它是开始/停止行,列之间为空白。
这些文件的所有键都作为文件中的标题,但该标题下可能没有数据。如何使用pandas path = "ornek.inp"
with open(path) as f:
content = f.readlines()
content = [x.strip() for x in content]
inp_file = {"[JUNCTIONS]": {"start": content.index("[JUNCTIONS]"), "stop": content.index("[RESERVOIRS]") - 1, "widths": [18,12,10,13]},
"[RESERVOIRS]": {"start": content.index("[RESERVOIRS]"), "stop": content.index("[TANKS]") - 1, "widths": [18,12,13]},
"[TANKS]": {"start": content.index("[TANKS]"), "stop": content.index("[PIPES]") - 1, "widths": [18,9,13,13,13,12,13,13]},
"[PIPES]": {"start": content.index("[PIPES]"), "stop": content.index("[PUMPS]") - 1, "widths": [18,9,15,16,15,14,13,13]},
"[PUMPS]": {"start": content.index("[PUMPS]"), "stop": content.index("[VALVES]") - 1, "widths": [12,13,15,22]},
"[VALVES]": {"start": content.index("[VALVES]"), "stop": content.index("[TAGS]") - 1, "widths": [18,15,15,12,10,13,9]},
"[TAGS]": {"start": content.index("[TAGS]")-1, "stop": content.index("[DEMANDS]") - 1, "widths": [12,12,12]},
"[DEMANDS]": {"start": content.index("[DEMANDS]"), "stop": content.index("[STATUS]") - 1, "widths": [12,12,14,18]},
"[STATUS]": {"start": content.index("[STATUS]"), "stop": content.index("[PATTERNS]") - 1, "widths": [12,12]},
"[PATTERNS]": {"start": content.index("[PATTERNS]"), "stop": content.index("[CURVES]") - 1, "widths": [12,17]},
"[CURVES]": {"start": content.index("[CURVES]"), "stop": content.index("[CONTROLS]") - 1, "widths": [12,13,15]},
"[CONTROLS]": {"start": content.index("[CONTROLS]"), "stop": content.index("[RULES]") - 1, "widths": []},
"[RULES]": {"start": content.index("[RULES]"), "stop": content.index("[ENERGY]") - 1, "widths": [12,12,12]},
"[ENERGY]": {"start": content.index("[ENERGY]")-1, "stop": content.index("[EMITTERS]") - 1, "widths": [12,12,12]},
"[EMITTERS]": {"start": content.index("[EMITTERS]"), "stop": content.index("[QUALITY]") - 1, "widths": [12,12,12]},
"[QUALITY]": {"start": content.index("[QUALITY]"), "stop": content.index("[SOURCES]") - 1, "widths": [12,12,12]},
"[SOURCES]": {"start": content.index("[SOURCES]"), "stop": content.index("[REACTIONS]") - 1, "widths": [12,12,12]},
"[REACTIONS]": {"start": content.index("[REACTIONS]"), "stop": content.index("[TANKS]") - 1, "widths": [12,12,12]},
"[MIXING]": {"start": content.index("[MIXING]"), "stop": content.index("[TIMES]") - 1, "widths": [12,12,12]},
"[TIMES]": {"start": content.index("[TIMES]"), "stop": content.index("[REPORT]") - 1, "widths": [12,12,12]},
"[REPORT]": {"start": content.index("[REPORT]"), "stop": content.index("[OPTIONS]") - 1, "widths": [12,12,12]},
"[OPTIONS]": {"start": content.index("[OPTIONS]"), "stop": content.index("[COORDINATES]") - 1, "widths": [12,12,12]},
"[COORDINATES]": {"start": content.index("[COORDINATES]"), "stop": content.index("[VERTICES]") - 1, "widths": [12,12,12]},
"[VERTICES]": {"start": content.index("[VERTICES]"), "stop": content.index("[LABELS]") - 1, "widths": [12,12,12]},
"[LABELS]": {"start": content.index("[LABELS]"), "stop": content.index("[BACKDROP]") - 1, "widths": [12,12,12]},
"[BACKDROP]": {"start": content.index("[BACKDROP]"), "stop": content.index("[END]") - 1, "widths": [12,12,12]}}
def logic(index, start, stop):
if start < index < stop:
return False
return True
fwidths = inp_file["[ENERGY]"]["widths"]
start = inp_file["[ENERGY]"]["start"]
stop = inp_file["[ENERGY]"]["stop"]
df = pd.read_fwf(path, widths=fwidths, skiprows=lambda x: logic(x, start, stop))
或其他pythonic解决方案来解决此问题?