Python pandas与文本分裂列

时间:2017-04-25 12:33:14

标签: python pandas

我有类似这样的数据,我不知道如何分割并转换成表格。

我使用pandas sep by |,但我不知道如何sep by |和=在这种情况下同时。

数据样本如下所示:txt:

SPK_VOLUME=|DEVICE_STATUS=|WAKE_UP=|SCS_STATUS=|SCS_CLASS=||MUSIC_URL_STATUS=|MUSIC_LOGIN_STATUS=|MUSIC_STREAMING_CONNECT_STATUS=|MUSIC_STREAMING_STATUS=|PLAYER_PLAYING_TIME=|TTS_STATUS=|TTS_CLASS=|ALARM_STATUS=|ALARM_END_REASON=|FOTA_STATUS=|FOTA_FAIL_REASON=
....

我用pandas加载数据

log_file = pd.read_csv("./log_file.txt",
                       sep = "|")

但是,我也想分开" ="并按值创建表。

SPK_VOLUME  DEVICE_STATUS   WAKE_UP
5   22221   0
2   42241   2
3   125214  1

感谢您的帮助

1 个答案:

答案 0 :(得分:2)

尝试传递sep=r'\=\|',这对我有用:

In [189]:

t="""SPK_VOLUME=|DEVICE_STATUS=|WAKE_UP=|SCS_STATUS=|SCS_CLASS=||MUSIC_URL_STATUS=|MUSIC_LOGIN_STATUS=|MUSIC_STREAMING_CONNECT_STATUS=|MUSIC_STREAMING_STATUS=|PLAYER_PLAYING_TIME=|TTS_STATUS=|TTS_CLASS=|ALARM_STATUS=|ALARM_END_REASON=|FOTA_STATUS=|FOTA_FAIL_REASON="""
df = pd.read_csv(io.StringIO(t), sep=r'\=\|')
df.columns.tolist()

Out[189]:
['SPK_VOLUME',
 'DEVICE_STATUS',
 'WAKE_UP',
 'SCS_STATUS',
 'SCS_CLASS',
 '|MUSIC_URL_STATUS',
 'MUSIC_LOGIN_STATUS',
 'MUSIC_STREAMING_CONNECT_STATUS',
 'MUSIC_STREAMING_STATUS',
 'PLAYER_PLAYING_TIME',
 'TTS_STATUS',
 'TTS_CLASS',
 'ALARM_STATUS',
 'ALARM_END_REASON',
 'FOTA_STATUS',
 'FOTA_FAIL_REASON=']

或者,您只需在.columns属性上调用.str.rstrip作为后处理步骤:

In [192]:
df.columns = df.columns.str.rstrip('=')
df.columns.tolist()

Out[192]:
['SPK_VOLUME',
 'DEVICE_STATUS',
 'WAKE_UP',
 'SCS_STATUS',
 'SCS_CLASS',
 'Unnamed: 5',
 'MUSIC_URL_STATUS',
 'MUSIC_LOGIN_STATUS',
 'MUSIC_STREAMING_CONNECT_STATUS',
 'MUSIC_STREAMING_STATUS',
 'PLAYER_PLAYING_TIME',
 'TTS_STATUS',
 'TTS_CLASS',
 'ALARM_STATUS',
 'ALARM_END_REASON',
 'FOTA_STATUS',
 'FOTA_FAIL_REASON']