我有一只熊猫DF看起来像:
Keyword | ranks | search_type | search_volume
kw1 |[{'rank': 1, 'url': example.com}]| 1 | 500
kw1 |[{'rank': 1, 'url': example.com}]| 2 | 500
kw2 |[{'rank': 2, 'url': example.com}]| 1 | 1500
kw2 |[{'rank': 2, 'url': example.com}]| 2 | 1500
kw3 |[{'rank': 1, 'url': example.com}]| 1 | 60
kw3 |[{'rank': 1, 'url': example.com}]| 2 | 60
我想要的是将ranks
拆分为两列:ranks
包含排名,以及一个名为url
的新列,其中包含网址,因此生成的df看起来像:
Keyword | ranks | url | search_type | search_volume
kw1 |[{'rank': 1 | 'url': example.com}]| 1 | 500
kw1 |[{'rank': 1 | 'url': example.com}]| 2 | 500
kw2 |[{'rank': 2 | 'url': example.com}]| 1 | 1500
kw2 |[{'rank': 2 | 'url': example.com}]| 2 | 1500
kw3 |[{'rank': 1 | 'url': example.com}]| 1 | 60
kw3 |[{'rank': 1 | 'url': example.com}]| 2 | 60
到目前为止,我已经尝试过:
df.ranks = df.ranks.str.split(',',1).tolist()
带回了Nan
的列表,我也尝试了df['ranks'].str.split(',', expand=True)
这不起作用。我试过了:
df = pd.DataFrame(df.ranks.str.split(' ',1).tolist(),columns = ['ranks','url'])
但我得到了ValueError: Shape of passed values is (1, 400), indices imply (2, 400)
。
编辑:df.ranks.dtype
返回dtype('0')
type(df.ranks)
返回pandas.core.series.Series
答案 0 :(得分:2)
我认为list
有dicts
,所以建议使用list comprehension选择第一个列表字典,然后按key
选择:
df['r'] = [x[0]['rank'] for x in df['ranks']]
df['u'] = [x[0]['url'] for x in df['ranks']]
print (df)
Keyword ranks search_type search_volume r \
0 kw1 [{'rank': 1, 'url': 'example.com'}] 1 500 1
1 kw1 [{'rank': 1, 'url': 'example.com'}] 2 500 1
2 kw2 [{'rank': 2, 'url': 'example.com'}] 1 1500 2
3 kw2 [{'rank': 2, 'url': 'example.com'}] 2 1500 2
4 kw3 [{'rank': 1, 'url': 'example.com'}] 1 60 1
5 kw3 [{'rank': 1, 'url': 'example.com'}] 2 60 1
u
0 example.com
1 example.com
2 example.com
3 example.com
4 example.com
5 example.com
或者:
df['r'] = [{'rank': x[0]['rank']} for x in df['ranks']]
df['u'] = [{'url': x[0]['url']} for x in df['ranks']]
print (df)
Keyword ranks search_type search_volume \
0 kw1 [{'rank': 1, 'url': 'example.com'}] 1 500
1 kw1 [{'rank': 1, 'url': 'example.com'}] 2 500
2 kw2 [{'rank': 2, 'url': 'example.com'}] 1 1500
3 kw2 [{'rank': 2, 'url': 'example.com'}] 2 1500
4 kw3 [{'rank': 1, 'url': 'example.com'}] 1 60
5 kw3 [{'rank': 1, 'url': 'example.com'}] 2 60
r u
0 {'rank': 1} {'url': 'example.com'}
1 {'rank': 1} {'url': 'example.com'}
2 {'rank': 2} {'url': 'example.com'}
3 {'rank': 2} {'url': 'example.com'}
4 {'rank': 1} {'url': 'example.com'}
5 {'rank': 1} {'url': 'example.com'}
答案 1 :(得分:1)
shinyjs.addMarker = function(){
// create a marker and add to map
var marker = new L.marker([53, -1]).addTo(map);
// really I'd be going off and querying an API, or doing
// something else for which there is no handy R function.
};
和strip
,其中expand参数设置为true,即
split
如果不是,你也可以将字符串转换为dict
df[['rank','url']] = df['ranks'].str.strip('{[]}').str.split(',',expand=True).values
Keyword ranks search_type search_volume rank url
0 kw1 [{'rank': 1, 'url': example.com}] 1 500 'rank': 1 'url': example.com
1 kw1 [{'rank': 1, 'url': example.com}] 2 500 'rank': 1 'url': example.com
2 kw2 [{'rank': 2, 'url': example.com}] 1 1500 'rank': 2 'url': example.com
3 kw2 [{'rank': 2, 'url': example.com}] 2 1500 'rank': 2 'url': example.com
4 kw3 [{'rank': 1, 'url': example.com}] 1 60 'rank': 1 'url': example.com
5 kw3 [{'rank': 1, 'url': example.com}] 2 60 'rank': 1 'url': example.com
答案 2 :(得分:0)
试试这个,
df['ranks'].str.split(', ', expand=True).rename(columns={0:'ranks',1:'url'}
)
Keyword ranks search_type search_volume
0 kw1 [{'rank': 1, 'url': example.com}] 1 500
1 kw1 [{'rank': 1, 'url': example.com}] 2 500
2 kw2 [{'rank': 2, 'url': example.com}] 1 1500
3 kw2 [{'rank': 2, 'url': example.com}] 2 1500
4 kw3 [{'rank': 1, 'url': example.com}] 1 60
5 kw3 [{'rank': 1, 'url': example.com}] 2 60
ranks url
0 [{'rank': 1 'url': example.com}]
1 [{'rank': 1 'url': example.com}]
2 [{'rank': 2 'url': example.com}]
3 [{'rank': 2 'url': example.com}]
4 [{'rank': 1 'url': example.com}]
5 [{'rank': 1 'url': example.com}]
答案 3 :(得分:0)
哇。请不要使用那条带并拆分黑客。如果您将输入数据作为字符串jsons,而不仅仅使用:
import json
df['rank'].map(lambda x: json.loads(x)[0]['rank'])
df['url'].map(lambda x: json.loads(x)[0]['url'])