我是使用Pandas数据帧的新手。我有.csv中的数据,如下所示:
foo, 1234,
bar, 4567
stuff, 7894
New Entry,,
morestuff,1345
我正在使用
将其读入数据框 df = pd.read_csv
但是,每当我有一个“新条目”行(显然没有包括它)时,我真正想要的是一个新的数据帧(或一种分割当前的数据帧)。怎么可以这样做?
答案 0 :(得分:1)
所以使用我连接3次的示例数据,加载后(为了方便,我将cols命名为' a'' b' c' c'然后我们找到你所拥有的指数' New Entry'并逐步产生这些位置的元组列表以标记乞讨,结束范围。
然后我们可以迭代这个元组列表并切片orig df并追加到列表中:
In [22]:
t="""foo,1234,
bar,4567
stuff,7894
New Entry,,
morestuff,1345"""
df = pd.read_csv(io.StringIO(t),header=None,names=['a','b','c'] )
df = pd.concat([df]*3, ignore_index=True)
df
Out[22]:
a b c
0 foo 1234 NaN
1 bar 4567 NaN
2 stuff 7894 NaN
3 New Entry NaN NaN
4 morestuff 1345 NaN
5 foo 1234 NaN
6 bar 4567 NaN
7 stuff 7894 NaN
8 New Entry NaN NaN
9 morestuff 1345 NaN
10 foo 1234 NaN
11 bar 4567 NaN
12 stuff 7894 NaN
13 New Entry NaN NaN
14 morestuff 1345 NaN
In [30]:
import itertools
idx = df[df['a'] == 'New Entry'].index
idx_list = [(0,idx[0])]
idx_list = idx_list + list(zip(idx, idx[1:]))
idx_list
Out[30]:
[(0, 3), (3, 8), (8, 13)]
In [31]:
df_list = []
for i in idx_list:
print(i)
if i[0] == 0:
df_list.append(df[i[0]:i[1]])
else:
df_list.append(df[i[0]+1:i[1]])
df_list
(0, 3)
(3, 8)
(8, 13)
Out[31]:
[ a b c
0 foo 1234 NaN
1 bar 4567 NaN
2 stuff 7894 NaN, a b c
4 morestuff 1345 NaN
5 foo 1234 NaN
6 bar 4567 NaN
7 stuff 7894 NaN, a b c
9 morestuff 1345 NaN
10 foo 1234 NaN
11 bar 4567 NaN
12 stuff 7894 NaN]
答案 1 :(得分:1)
1)在逐行阅读文件的同时动态执行并检查NewEntry
中断是一种方法。
2)其他方式,如果数据帧已经存在,则找到NewEntry
并将数据帧切分为多个dff = {}
df
col1 col2
0 foo 1234
1 bar 4567
2 stuff 7894
3 NewEntry NaN
4 morestuff 1345
查找NewEntry
行,为边界条件添加[-1]
和[len(df.index)]
rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]
创建数据帧的dict
dff = {}
for i, r in enumerate(rows[:-1]):
dff[i] = df[r+1: rows[i+1]]
数据帧的字典{0:datafram1,1:dataframe2}
dff
{0: col1 col2
0 foo 1234
1 bar 4567
2 stuff 7894, 1: col1 col2
4 morestuff 1345}
Dataframe 1
dff[0]
col1 col2
0 foo 1234
1 bar 4567
2 stuff 7894
Dataframe 2
dff[1]
col1 col2
4 morestuff 1345