Question

我是使用Pandas数据帧的新手。我有.csv中的数据，如下所示：

foo, 1234,
bar, 4567
stuff, 7894
New Entry,,
morestuff,1345

我正在使用

将其读入数据框

 df = pd.read_csv

但是，每当我有一个“新条目”行（显然没有包括它）时，我真正想要的是一个新的数据帧（或一种分割当前的数据帧）。怎么可以这样做？

Answer 1

所以使用我连接3次的示例数据，加载后（为了方便，我将cols命名为＆＃39; a＆＃39;＆＃39; b＆＃39; c＆＃39; c＆＃39;然后我们找到你所拥有的指数＆＃39; New Entry＆＃39;并逐步产生这些位置的元组列表以标记乞讨，结束范围。

然后我们可以迭代这个元组列表并切片orig df并追加到列表中：

In [22]:

t="""foo,1234,
bar,4567
stuff,7894
New Entry,,
morestuff,1345"""
df = pd.read_csv(io.StringIO(t),header=None,names=['a','b','c'] )
df = pd.concat([df]*3, ignore_index=True)
df
Out[22]:
            a     b   c
0         foo  1234 NaN
1         bar  4567 NaN
2       stuff  7894 NaN
3   New Entry   NaN NaN
4   morestuff  1345 NaN
5         foo  1234 NaN
6         bar  4567 NaN
7       stuff  7894 NaN
8   New Entry   NaN NaN
9   morestuff  1345 NaN
10        foo  1234 NaN
11        bar  4567 NaN
12      stuff  7894 NaN
13  New Entry   NaN NaN
14  morestuff  1345 NaN
In [30]:

import itertools
idx = df[df['a'] == 'New Entry'].index
idx_list = [(0,idx[0])]
idx_list = idx_list + list(zip(idx, idx[1:]))
idx_list


Out[30]:
[(0, 3), (3, 8), (8, 13)]
In [31]:

df_list = []
for i in idx_list:  
    print(i)
    if i[0] == 0:
        df_list.append(df[i[0]:i[1]])
    else:
        df_list.append(df[i[0]+1:i[1]])
df_list
(0, 3)
(3, 8)
(8, 13)
Out[31]:
[       a     b   c
 0    foo  1234 NaN
 1    bar  4567 NaN
 2  stuff  7894 NaN,            a     b   c
 4  morestuff  1345 NaN
 5        foo  1234 NaN
 6        bar  4567 NaN
 7      stuff  7894 NaN,             a     b   c
 9   morestuff  1345 NaN
 10        foo  1234 NaN
 11        bar  4567 NaN
 12      stuff  7894 NaN]

Answer 2

1）在逐行阅读文件的同时动态执行并检查NewEntry中断是一种方法。

2）其他方式，如果数据帧已经存在，则找到NewEntry并将数据帧切分为多个dff = {}

df                                                                 
        col1  col2  
0        foo  1234    
1        bar  4567                
2      stuff  7894                                                        
3   NewEntry   NaN                       
4  morestuff  1345

查找NewEntry行，为边界条件添加[-1]和[len(df.index)]

rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]

创建数据帧的dict

dff = {}                                                                            
for i, r in enumerate(rows[:-1]):                                                   
    dff[i] = df[r+1: rows[i+1]]

数据帧的字典{0：datafram1,1：dataframe2}

dff                           
{0:     col1  col2            
 0    foo  1234               
 1    bar  4567               
 2  stuff  7894, 1:         col1  col2  
 4  morestuff  1345}

Dataframe 1

dff[0]              
    col1  col2      
0    foo  1234      
1    bar  4567      
2  stuff  7894

Dataframe 2

dff[1]              
        col1  col2  
4  morestuff  1345

按字符串拆分pandas数据帧

2 个答案: