Pandas CSV将合并的重复列合并为一个,然后将重复的列合并合并为数组

时间:2019-07-16 20:02:21

标签: pandas group-by aggregate reshape melt




    id  name    x   y   z  x.1   y.1   z.1  state country state.1 country.1 

0   1   a       1   2   3  0     9     9    NY    USA     PORTO  PORTUGAL    
1   1   b       4   5   6  9     9     0    NJ    USA     MADRID   SPAIN
2   2   a       7   8   9  0     9     0    CT    USA     PARIS     FRANCE
3   2   b       10  11  12 9     0     9    WY    USA     VENACE    ITALY


    id  name    x        y       z     visited_places
0   1   a       [1,0]   [2,9]   [3,9]  [{state: NY, country: USA}, {state: PORTO, state: PORTUGAL]
1   1   b       [4,9]   [5,9]   [6,0]  [{state: NJ, state: USA}, {state: MADRID, state: MADRID]
2   2   a       [7,0]   [8,9]   [9,0]  [{state: CT, state: USA}, {state: PARIS, state: PARIS]
3   2   b       [10,9]  [11,0]  [12,9] [{state: WY, state: USA}, {state: VENACE, state: ITALY]


我尝试使用lreshape,melt和apply(lambda x:','。join(x))的组合,但是,我无法获得想要的最终结果。

# Have tried combining column based on column name, however, this won't cover state.1 country.1 state.2 country.2 and so on...
df['visited_places'] = df['state'][['country']].values,sep=' ,')

# Have tried to combine using reshape/melt, however, the functions don't take paired state, country and in order like NJ, USA.  Values are all kind of like jumbled. 
df = pd.lreshape(df, {'visited_places':df.columns[df.columns.str.match('^state\.?\d?')].append(df.columns[df.columns.str.match('^country\.?\d?')])})

# Due to the above I haven't gotten to the part where I compress rows to only 4 rows for example, and all the visited_places are in an array as shown above in "New/Updated CSV" section.





0 个答案:
