您好我有一个csv文件,它目前有2列1000>行。我希望每个逗号分隔值都是它所在的一列中的新列。
以下是我的csv示例:
print df4
keys env
0 FIT-2990 3000.0010
1 FIT-2918 3000.0004
2 FIT-2854 2110.0070, 2110.0071
3 UXSCIENCE-640 1808.0001
4 FIT-2814 1135.0017, 1135.0018, 1135.0019
5 FIT-2766 1908.0043, 1908.0044
6 FIT-2760 1901.0012, 1903.0045, 1906.0020, 1922.0032, 19...
7 FIT-2725 0147.0001
8 FIT-2706 1903.0045, 1922.0032
9 FIT-2554 1802.0024, 1805.0028
10 FIT-2383 , 1910
11 FIT-2339 2113.0021
12 UXSCIENCE-438 4000.0237, 4000.0238, 4000.0339
13 FIT-2201 2023.0013, 2016.0013, 2019.0013
我想分开ex:2110,0070 | 2110.0071成为整个csv的单独列。
到目前为止我得到了什么......
df5 = df4.join(df4.apply(lambda x: Series(x.split(', '))))
print df5
答案 0 :(得分:2)
import pandas as pd
import numpy as np
import io
temp1=u"""keys;env
FIT-2990;3000.0010
FIT-2918;3000.0004
FIT-2854;2110.0070, 2110.0071
UXSCIENCE-640;1808.0001
FIT-2814;1135.0017, 1135.0018, 1135.0019
FIT-2766;1908.0043, 1908.0044
FIT-2760;1901.0012, 1903.0045, 1906.0020, 1922.0032, 19...
FIT-2725;0147.0001
FIT-2706;1903.0045, 1922.0032
FIT-2554;1802.0024, 1805.0028
FIT-2383;, 1910
FIT-2339;2113.0021
UXSCIENCE-438;4000.0237, 4000.0238, 4000.0339
FIT-2201;2023.0013, 2016.0013, 2019.0013"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp1), sep=";", index_col=None)
print df
#faster
df1 = pd.DataFrame([ x.split(',') for x in df['env'].tolist() ])
#slower
df1 = df['env'].str.split(',', expand=True)
print pd.concat([df['keys'], df1], axis=1)
keys 0 1 2 3 4
0 FIT-2990 3000.0010 None None None None
1 FIT-2918 3000.0004 None None None None
2 FIT-2854 2110.0070 2110.0071 None None None
3 UXSCIENCE-640 1808.0001 None None None None
4 FIT-2814 1135.0017 1135.0018 1135.0019 None None
5 FIT-2766 1908.0043 1908.0044 None None None
6 FIT-2760 1901.0012 1903.0045 1906.0020 1922.0032 19...
7 FIT-2725 0147.0001 None None None None
8 FIT-2706 1903.0045 1922.0032 None None None
9 FIT-2554 1802.0024 1805.0028 None None None
10 FIT-2383 1910 None None None
11 FIT-2339 2113.0021 None None None None
12 UXSCIENCE-438 4000.0237 4000.0238 4000.0339 None None
13 FIT-2201 2023.0013 2016.0013 2019.0013 None None