如何通过拆分现有的pandas列来创建新的pandas列

时间:2018-05-31 12:31:48

标签: python pandas

我有一只熊猫DF看起来像:

Keyword |              ranks              | search_type |   search_volume
kw1     |[{'rank': 1, 'url': example.com}]|  1          |   500
kw1     |[{'rank': 1, 'url': example.com}]|  2          |   500
kw2     |[{'rank': 2, 'url': example.com}]|  1          |   1500
kw2     |[{'rank': 2, 'url': example.com}]|  2          |   1500
kw3     |[{'rank': 1, 'url': example.com}]|  1          |   60
kw3     |[{'rank': 1, 'url': example.com}]|  2          |   60

我想要的是将ranks拆分为两列:ranks包含排名,以及一个名为url的新列,其中包含网址,因此生成的df看起来像:

Keyword |   ranks    |        url          | search_type |   search_volume
kw1     |[{'rank': 1 | 'url': example.com}]|  1          |   500
kw1     |[{'rank': 1 | 'url': example.com}]|  2          |   500
kw2     |[{'rank': 2 | 'url': example.com}]|  1          |   1500
kw2     |[{'rank': 2 | 'url': example.com}]|  2          |   1500
kw3     |[{'rank': 1 | 'url': example.com}]|  1          |   60
kw3     |[{'rank': 1 | 'url': example.com}]|  2          |   60

到目前为止,我已经尝试过:

df.ranks = df.ranks.str.split(',',1).tolist()带回了Nan的列表,我也尝试了df['ranks'].str.split(',', expand=True)这不起作用。我试过了:

df = pd.DataFrame(df.ranks.str.split(' ',1).tolist(),columns = ['ranks','url'])

但我得到了ValueError: Shape of passed values is (1, 400), indices imply (2, 400)

编辑:df.ranks.dtype返回dtype('0') type(df.ranks)返回pandas.core.series.Series

4 个答案:

答案 0 :(得分:2)

我认为listdicts,所以建议使用list comprehension选择第一个列表字典,然后按key选择:

df['r'] = [x[0]['rank'] for x in df['ranks']]
df['u'] = [x[0]['url'] for x in df['ranks']]
print (df)
  Keyword                                ranks  search_type  search_volume  r  \
0     kw1  [{'rank': 1, 'url': 'example.com'}]            1            500  1   
1     kw1  [{'rank': 1, 'url': 'example.com'}]            2            500  1   
2     kw2  [{'rank': 2, 'url': 'example.com'}]            1           1500  2   
3     kw2  [{'rank': 2, 'url': 'example.com'}]            2           1500  2   
4     kw3  [{'rank': 1, 'url': 'example.com'}]            1             60  1   
5     kw3  [{'rank': 1, 'url': 'example.com'}]            2             60  1   

             u  
0  example.com  
1  example.com  
2  example.com  
3  example.com  
4  example.com  
5  example.com 

或者:

df['r'] = [{'rank': x[0]['rank']} for x in df['ranks']]
df['u'] = [{'url': x[0]['url']} for x in df['ranks']]
print (df)
  Keyword                                ranks  search_type  search_volume  \
0     kw1  [{'rank': 1, 'url': 'example.com'}]            1            500   
1     kw1  [{'rank': 1, 'url': 'example.com'}]            2            500   
2     kw2  [{'rank': 2, 'url': 'example.com'}]            1           1500   
3     kw2  [{'rank': 2, 'url': 'example.com'}]            2           1500   
4     kw3  [{'rank': 1, 'url': 'example.com'}]            1             60   
5     kw3  [{'rank': 1, 'url': 'example.com'}]            2             60   

             r                       u  
0  {'rank': 1}  {'url': 'example.com'}  
1  {'rank': 1}  {'url': 'example.com'}  
2  {'rank': 2}  {'url': 'example.com'}  
3  {'rank': 2}  {'url': 'example.com'}  
4  {'rank': 1}  {'url': 'example.com'}  
5  {'rank': 1}  {'url': 'example.com'}  

答案 1 :(得分:1)

shinyjs.addMarker = function(){ // create a marker and add to map var marker = new L.marker([53, -1]).addTo(map); // really I'd be going off and querying an API, or doing // something else for which there is no handy R function. }; strip,其中expand参数设置为true,即

split

如果不是,你也可以将字符串转换为dict

df[['rank','url']] = df['ranks'].str.strip('{[]}').str.split(',',expand=True).values

  Keyword                              ranks  search_type  search_volume       rank                  url
0  kw1       [{'rank': 1, 'url': example.com}]            1            500  'rank': 1   'url': example.com
1  kw1       [{'rank': 1, 'url': example.com}]            2            500  'rank': 1   'url': example.com
2  kw2       [{'rank': 2, 'url': example.com}]            1           1500  'rank': 2   'url': example.com
3  kw2       [{'rank': 2, 'url': example.com}]            2           1500  'rank': 2   'url': example.com
4  kw3       [{'rank': 1, 'url': example.com}]            1             60  'rank': 1   'url': example.com
5  kw3       [{'rank': 1, 'url': example.com}]            2             60  'rank': 1   'url': example.com

答案 2 :(得分:0)

试试这个,

df['ranks'].str.split(', ', expand=True).rename(columns={0:'ranks',1:'url'}

   Keyword                               ranks   search_type   search_volume
0  kw1       [{'rank': 1, 'url': example.com}]              1            500
1  kw1       [{'rank': 1, 'url': example.com}]              2            500
2  kw2       [{'rank': 2, 'url': example.com}]              1           1500
3  kw2       [{'rank': 2, 'url': example.com}]              2           1500
4  kw3       [{'rank': 1, 'url': example.com}]              1             60
5  kw3       [{'rank': 1, 'url': example.com}]              2             60
         ranks                   url
0  [{'rank': 1  'url': example.com}]
1  [{'rank': 1  'url': example.com}]
2  [{'rank': 2  'url': example.com}]
3  [{'rank': 2  'url': example.com}]
4  [{'rank': 1  'url': example.com}]
5  [{'rank': 1  'url': example.com}]

答案 3 :(得分:0)

哇。请不要使用那条带并拆分黑客。如果您将输入数据作为字符串jsons,而不仅仅使用:

import json

df['rank'].map(lambda x: json.loads(x)[0]['rank'])
df['url'].map(lambda x: json.loads(x)[0]['url'])