什么是合适的分隔符?

时间:2018-10-22 18:32:26

标签: python pandas

我有一个具有以下结构的文本文件:

>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled
MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL
KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY
>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled
MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL
KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY

我需要按以下表格结构加载和转换此文件:

--------------------------------------------------------------
|>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled |
|MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL|
|KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY|
--------------------------------------------------------------
|>hsa:9934 K04299 purinergic receptor P2Y, G protein-coupled |
|MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPSSKSFIIYL|
|KNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRY|
--------------------------------------------------------------

我尝试了以下代码:

dataset = pd.read_csv(path, sep = ">")

但是它没有按我预期的那样工作!

如何获取准确的格式?

1 个答案:

答案 0 :(得分:2)

您可以使用str.split('>'),因此最终得到每个值的数组。 除非“>”可能出现在散列中