我有一个像这样的字符串形式的文本表,它很长。也可以将其写入文件,
+--------------+----------+---------+------------+
| Endpoint | Table | Request | Is Updated |
+--------------+----------+---------+------------+
| /api/test1 | test1 | True | True |
+--------------+----------+---------+------------+
| /api/test2 | test2 | False | False |
+--------------+----------+---------+------------+
| /api/test3 | test3 | False | True |
+--------------+----------+---------+------------+
我想将其转换为熊猫数据框。这是我的预期输出:
>>> import pandas as pd
>>> df = pd.DataFrame(
{'Endpoint': ['/api/test1', '/api/test2', '/api/test3'],
'Table': ['test1', 'test2', 'test3'],
'Request': [True, False, False],
'Is Updated': [True, False, True]},
)... ... ... ... ...
>>> df
Endpoint Table Request Is Updated
0 /api/test1 test1 True True
1 /api/test2 test2 False False
2 /api/test3 test3 False True
谢谢。
答案 0 :(得分:5)
IIUC,使用re.sub
替换正则表达式,使用io.StringIO
读取:
import re
from io import StringIO
text = """
+--------------+----------+---------+------------+
| Endpoint | Table | Request | Is Updated |
+--------------+----------+---------+------------+
| /api/test1 | test1 | True | True |
+--------------+----------+---------+------------+
| /api/test2 | test2 | False | False |
+--------------+----------+---------+------------+
| /api/test3 | test3 | False | True |
+--------------+----------+---------+------------+
"""
df = pd.read_csv(StringIO(re.sub(r'[-+|]', '', text)), sep='\s{2,}', engine='python')
print(df)
输出:
Endpoint Table Request Is Updated
0 /api/test1 test1 True True
1 /api/test2 test2 False False
2 /api/test3 test3 False True