Python Pandas将每列放到一个新行

时间:2016-06-16 17:28:07

标签: python csv pandas

我正在尝试使用拆分数据从现有行中创建新行。我目前有(注意随机空格有时存在于引号和数字之间)

FIT-1401,"0327.0001, 0327.0002"
FIT-1056," 0361.0001, 0361.0004, 3000.0010"
FIT-831,1120.0009
FIT-491,1207

我想完全没有空格格式化。

FIT-1401,0327.0001
FIT-1401,0327.0002
FIT-1056,0361.0001
FIT-1056,0361.0004
FIT-1056,3000.0010
FIT-831,1120.0009
FIT-491,1207

目前我的代码能够拆分它,但无法使用这种干净的格式。

    #THIS FUNCTION WILL SEPERATE TC NUMBERS INTO SEPERATE COLUMNS from the jira query
#####Cleans Open CSV######

dfcleancsv = pd.read_csv('InitialQuerydataOpen.csv', sep=",", dtype='object')

dfcleancsv.columns = ['KEYS', 'ENV']


#Takes all the data after TC
s = dfcleancsv['ENV']


#removes column with junk information

##TODO CLEAN UP COLUMNS with spaces to new rows
dfcleancsv = dfcleancsv.join(s.apply(lambda x: Series(x.split('TC'))))

1 个答案:

答案 0 :(得分:2)

Setup

from StringIO import StringIO
import pandas as pd

text = """FIT-1401,"0327.0001, 0327.0002"
FIT-1056," 0361.0001, 0361.0004, 3000.0010"
FIT-831,1120.0009
FIT-491,1207"""

df = pd.read_csv(StringIO(text), index_col=0, header=None)

Solution

df1 = df.iloc[:, 0].str.replace(' ', '').str.split(',', expand=True)
df1 = df1.stack().reset_index(1, drop=True)
print pd.DataFrame(df1).to_csv(header=None)

FIT-1401,0327.0001
FIT-1401,0327.0002
FIT-1056,0361.0001
FIT-1056,0361.0004
FIT-1056,3000.0010
FIT-831,1120.0009
FIT-491,1207

Explanation

  • str.replace gets rid of spaces
  • str.split expands comma separated values into their own columns
  • stack pushes all columns into rows
  • reset_index cleans up residual index level
  • pd.DataFrame wrapper enables me to to_csv without a file argument so I can print to screen