
时间:2019-11-20 15:18:21

标签: python pandas


import pandas as pd
data = pd.read_csv('Gun violence Shortened version.csv')


 incident_id    date    state   participant_type    
0   461105  1/1/2013    Pennsylvania    0::Victim||1::Victim||2::Victim||3::Victim||4:...   
1   460726  1/1/2013    California  0::Victim||1::Victim||2::Victim||3::Victim||4:...   
2   478855  1/1/2013    Ohio    0::Subject-Suspect||1::Subject-Suspect||2::Vic...   
3   478925  1/5/2013    Colorado    0::Victim||1::Victim||2::Victim||3::Subject-Su...   
4   478959  1/7/2013    North Carolina  0::Victim||1::Victim||2::Victim||3::Subject-Su...   


incident_id date    state   participant_type    
0   461105  1/1/2013    Pennsylvania    Victim
1   461105  1/1/2013    Pennsylvania    Victim
2   461105  1/1/2013    Pennsylvania    Victim
3   461105  1/1/2013    Pennsylvania    Subject-Suspect *this was the 4:: instance that was cut off earlier*


2 个答案:

答案 0 :(得分:0)

我宁愿使用常规Python的数据结构预先准备数据,然后从中创建Pandas DataFrame。原因是Pandas并不是主要为诸如单个行操作之类的操作而设计的,尽管有很多方法可以做到,但它considered an anti-pattern却要慢得多。

以下代码使用Python标准库中的CSV module将CSV数据解析为常规列表,同时为最后一行中包含多个项目的每个CSV行添加多行。在最后一步中,只需从预处理列表中创建熊猫DataFrame

import pandas as pd
import csv

data = []
with open('Gun violence Shortened versio.csv') as file:
    reader = csv.reader(file, delimiter=',')

    # iterate over all rows in the CSV
    for row in reader:
        # split the content of the last column by the || delimiter into a list
        # if there's no delimiter, it will produce a single-item list
        items = row[3].split('||')

        # append each item from the last column together with other columns
        # as an individual row to the data list, N items will produce N rows
        for item in items:
            data.append([row[0], row[1], row[2], item])

df = pd.DataFrame(data)


some benchmarks,在Pandas中对行的操作比使用Python的数据结构准备数据并从中创建DatFrame慢约1000倍。

答案 1 :(得分:0)

