我具有以下格式的数据框
version,1.3.0
info,team,Australia
info,team,India
info,gender,male
ball,1,0.5,India,V Sehwag,IK Pathan,B Lee,0,0,"",""
ball,1,0.6,India,V Sehwag,IK Pathan,B Lee,0,0,"",""
我想用熊猫将数据分成两个
第一个数据帧
info,team,Australia
info,team,India
info,gender,male
第二个数据帧
ball,1,0.5,India,V Sehwag,IK Pathan,B Lee,0,0,"",""
ball,1,0.6,India,V Sehwag,IK Pathan,B Lee,0,0,"",""
答案 0 :(得分:2)
使用itertools
groupby
from itertools import groupby
text = """version,1.3.0
info,team,Australia
info,team,India
info,gender,male
ball,1,0.5,India,V Sehwag,IK Pathan,B Lee,0,0,"",""
ball,1,0.6,India,V Sehwag,IK Pathan,B Lee,0,0,"",""
"""
g = groupby(text.splitlines()[1:], key=lambda x: x.split(',')[0])
df1, df2 = (
pd.read_csv(pd.io.common.StringIO('\n'.join(t[1])), header=None)
for t in g
)
print(df1, df2, sep='\n\n')
0 1 2
0 info team Australia
1 info team India
2 info gender male
0 1 2 3 4 5 6 7 8 9 10
0 ball 1 0.5 India V Sehwag IK Pathan B Lee 0 0 NaN NaN
1 ball 1 0.6 India V Sehwag IK Pathan B Lee 0 0 NaN NaN
答案 1 :(得分:1)
我认为需要:
df = pd.read_excel(file, header=None, skiprows=1)
df1 = df[df[0] == 'info']
df2 = df[df[0] == 'ball']
或更一般的做法是创建dictionary of DataFrame
:
dfs = dict(tuple(df.groupby(0)))
print (dfs['info'])
print (dfs['ball'])
编辑:如果有多个文件,最好使用csv
模块,并按第一个值将每行追加到列表中,然后由构造函数创建DataFrame
s:
import csv, glob
info, ball = [],[]
for f in glob.glob('csv/*.csv'):
with open(f, "r") as f1:
reader = csv.reader(f1)
for L in reader:
if L[0] == 'info':
info.append(L)
if L[0] == 'ball':
ball.append(L)
#print (info)
#print (ball)
df1 = pd.DataFrame(info)
print (df1)
df2 = pd.DataFrame(ball)
print (df2)
如果要为每个文件创建2个数据框:
for f in glob.glob('csv/*.csv'):
with open(f, "r") as f1:
info, ball = [],[]
reader = csv.reader(f1)
for L in reader:
if L[0] == 'info':
info.append(L)
if L[0] == 'ball':
ball.append(L)
df1 = pd.DataFrame(info)
print (df1)
df2 = pd.DataFrame(ball)
print (df2)
另一种解决方案:
for f in glob.glob('csv/*.csv'):
df = pd.read_csv(f, sep='delimit',
skipinitialspace = True,
skiprows=1,
quotechar = '"',
names=['data'])
df1 = df.loc[df['data'].str.startswith('info'), 'data'].str.split(',',expand=True)
df2 = df.loc[df['data'].str.startswith('ball'), 'data'].str.split(',',expand=True)
print (df1)
print (df2)
答案 2 :(得分:0)
我终于找到了拆分数据的方法。代码如下。 我正在读取200个CSV文件并将其转换为2个数据帧,其中一个包含“信息”,另一个包含“球”
import pandas as pd
import os
files = os.listdir("merge_data")
finalRunsData = pd.DataFrame()
for file in files:
#Dataframe declaration
df = pd.DataFrame()
matchData = pd.DataFrame()
runsData = pd.DataFrame()
#Read excel data
df = pd.read_csv('merge_data/'+file,sep='delimit',header=None,skipinitialspace = True,skiprows=1,quotechar = '"')
#Split runs data
matchData = df[df[0].str.match('info')]
runsData = df[df[0].str.match('ball')]
#For runs
finalRunsData = finalRunsData.append(runsData[0].str.split(',',expand=True))