我正在尝试使用拆分数据从现有行中创建新行。我目前有(注意随机空格有时存在于引号和数字之间)
FIT-1401,"0327.0001, 0327.0002"
FIT-1056," 0361.0001, 0361.0004, 3000.0010"
FIT-831,1120.0009
FIT-491,1207
我想完全没有空格格式化。
FIT-1401,0327.0001
FIT-1401,0327.0002
FIT-1056,0361.0001
FIT-1056,0361.0004
FIT-1056,3000.0010
FIT-831,1120.0009
FIT-491,1207
目前我的代码能够拆分它,但无法使用这种干净的格式。
#THIS FUNCTION WILL SEPERATE TC NUMBERS INTO SEPERATE COLUMNS from the jira query
#####Cleans Open CSV######
dfcleancsv = pd.read_csv('InitialQuerydataOpen.csv', sep=",", dtype='object')
dfcleancsv.columns = ['KEYS', 'ENV']
#Takes all the data after TC
s = dfcleancsv['ENV']
#removes column with junk information
##TODO CLEAN UP COLUMNS with spaces to new rows
dfcleancsv = dfcleancsv.join(s.apply(lambda x: Series(x.split('TC'))))
答案 0 :(得分:2)
from StringIO import StringIO
import pandas as pd
text = """FIT-1401,"0327.0001, 0327.0002"
FIT-1056," 0361.0001, 0361.0004, 3000.0010"
FIT-831,1120.0009
FIT-491,1207"""
df = pd.read_csv(StringIO(text), index_col=0, header=None)
df1 = df.iloc[:, 0].str.replace(' ', '').str.split(',', expand=True)
df1 = df1.stack().reset_index(1, drop=True)
print pd.DataFrame(df1).to_csv(header=None)
FIT-1401,0327.0001
FIT-1401,0327.0002
FIT-1056,0361.0001
FIT-1056,0361.0004
FIT-1056,3000.0010
FIT-831,1120.0009
FIT-491,1207
str.replace
gets rid of spacesstr.split
expands comma separated values into their own columnsstack
pushes all columns into rowsreset_index
cleans up residual index levelpd.DataFrame
wrapper enables me to to_csv
without a file argument so I can print to screen