我想知道将位置列拆分为几个新列,例如城市,州代码和熊猫国家。 从这个:
'Location': {0: 'Warszawa, Poland',
1: 'San Francisco, CA, United States',
2: 'Los Angeles, CA, United States',
3: 'Sunnyvale, CA, United States',
4: 'Sunnyvale, CA, United States',
5: 'San Francisco, CA, United States',
6: 'Sunnyvale, CA, United States',
7: 'Kraków, Poland',
8: 'Shanghai, China',
9: 'Mountain View, CA, United States',
10: 'Boulder, CO, United States',
11: 'Boulder, CO, United States',
12: 'Xinyi District, Taiwan',
13: 'Tel Aviv-Yafo, Israel',
14: 'Wrocław, Poland',
15: 'Singapore'}
对此:
'Country': {0: 'Poland',
1: 'United States',
2: 'United States',
3: 'United States',
4: 'United States',
5: 'United States',
6: 'United States',
7: 'Poland',
8: 'China',
9: 'United States',
10: 'United States',
11: 'United States',
12: 'Taiwan',
13: 'Israel',
14: 'Poland',
15: 'Singapore'}
谢谢。
答案 0 :(得分:1)
我不确定这是最好的方法,其他人请评论或提出更好的方法。 我试图拆分数据,但是挑战在于,外国只有城市和国家/地区名称,而美国的条目只有城市,国家和国家/地区。因此,我无法用一种方法拆分它。下面是我用来拆分数据的两种方法,然后您必须弄清楚如何合并为一个数据帧。
b = pd.DataFrame ({'Location': {0: 'Warszawa, Poland',
1: 'San Francisco, CA, United States',
2: 'Los Angeles, CA, United States',
3: 'Sunnyvale, CA, United States',
4: 'Sunnyvale, CA, United States',
5: 'San Francisco, CA, United States',
6: 'Sunnyvale, CA, United States',
7: 'Kraków, Poland',
8: 'Shanghai, China',
9: 'Mountain View, CA, United States',
10: 'Boulder, CO, United States',
11: 'Boulder, CO, United States',
12: 'Xinyi District, Taiwan',
13: 'Tel Aviv-Yafo, Israel',
14: 'Wrocław, Poland',
15: 'Singapore'}})
c[['City', 'Country']] = b['Location'].str.split(',', n=1, expand=True) # This splits the data into city and Country. So this works very well for Foriegn address or data with just city and country.
Out put is:
City Country
0 Warszawa Poland
1 San Francisco CA, United States
2 Los Angeles CA, United States
3 Sunnyvale CA, United States
4 Sunnyvale CA, United States
5 San Francisco CA, United States
6 Sunnyvale CA, United States
7 Kraków Poland
8 Shanghai China
第二种方法是:
regex = r'(?P<City>[^,]+)\s*,\s*(?P<State>[^\s]+)\s+(?P<Country>[^,]+)'
df=b['Location'].str.extract(regex)
df # This splits the data into City, State and Country, so it works well for US address.
Output is :
City State Country
0 NaN NaN NaN
1 San Francisco CA, United States
2 Los Angeles CA, United States
3 Sunnyvale CA, United States
4 Sunnyvale CA, United States
5 San Francisco CA, United States
6 Sunnyvale CA, United States
7 NaN NaN NaN
答案 1 :(得分:0)
$ ipython
Python 3.6.8 |Anaconda custom (64-bit)| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: d = {'Location': {0: 'Warszawa, Poland',
...: 1: 'San Francisco, CA, United States',
...: 2: 'Los Angeles, CA, United States',
...: 3: 'Sunnyvale, CA, United States',
...: 4: 'Sunnyvale, CA, United States',
...: 5: 'San Francisco, CA, United States',
...: 6: 'Sunnyvale, CA, United States',
...: 7: 'Kraków, Poland',
...: 8: 'Shanghai, China',
...: 9: 'Mountain View, CA, United States',
...: 10: 'Boulder, CO, United States',
...: 11: 'Boulder, CO, United States',
...: 12: 'Xinyi District, Taiwan',
...: 13: 'Tel Aviv-Yafo, Israel',
...: 14: 'Wrocław, Poland',
...: 15: 'Singapore'}}
In [2]: import pandas as pd
...: df = pd.DataFrame.from_dict(d)
...: df
Out[2]:
Location
0 Warszawa, Poland
1 San Francisco, CA, United States
2 Los Angeles, CA, United States
3 Sunnyvale, CA, United States
4 Sunnyvale, CA, United States
5 San Francisco, CA, United States
6 Sunnyvale, CA, United States
7 Kraków, Poland
8 Shanghai, China
9 Mountain View, CA, United States
10 Boulder, CO, United States
11 Boulder, CO, United States
12 Xinyi District, Taiwan
13 Tel Aviv-Yafo, Israel
14 Wrocław, Poland
15 Singapore
In [3]: df['Country'] = df['Location'].str.split(',').apply(lambda x: x[-1])
...: df
Out[3]:
Location Country
0 Warszawa, Poland Poland
1 San Francisco, CA, United States United States
2 Los Angeles, CA, United States United States
3 Sunnyvale, CA, United States United States
4 Sunnyvale, CA, United States United States
5 San Francisco, CA, United States United States
6 Sunnyvale, CA, United States United States
7 Kraków, Poland Poland
8 Shanghai, China China
9 Mountain View, CA, United States United States
10 Boulder, CO, United States United States
11 Boulder, CO, United States United States
12 Xinyi District, Taiwan Taiwan
13 Tel Aviv-Yafo, Israel Israel
14 Wrocław, Poland Poland
15 Singapore Singapore
In [4]: df['Country'].to_dict()
Out[4]:
{0: ' Poland',
1: ' United States',
2: ' United States',
3: ' United States',
4: ' United States',
5: ' United States',
6: ' United States',
7: ' Poland',
8: ' China',
9: ' United States',
10: ' United States',
11: ' United States',
12: ' Taiwan',
13: ' Israel',
14: ' Poland',
15: 'Singapore'}
答案 2 :(得分:0)
这稍作改进,可以完成相同的工作,并且可以放在一行代码中。
b['City'] = b['Location'].str.split(',').apply(lambda x: x[0])
b['Country'] = b['Location'].str.split(',').apply(lambda x: x[-1])
b
输出:
Location City Country
0 Warszawa, Poland Warszawa Poland
1 San Francisco, CA, United States San Francisco United States
2 Los Angeles, CA, United States Los Angeles United States
3 Sunnyvale, CA, United States Sunnyvale United States
4 Sunnyvale, CA, United States Sunnyvale United States
5 San Francisco, CA, United States San Francisco United States
6 Sunnyvale, CA, United States Sunnyvale United States
7 Kraków, Poland Kraków Poland
8 Shanghai, China Shanghai China
b['City', 'Country']= pd.DataFrame (b['Location'].str.split(',').apply(lambda x:( x[0], x[-1])))
(City, Country)
0 (Warszawa, Poland)
1 (San Francisco, United States)
2 (Los Angeles, United States)
3 (Sunnyvale, United States)
4 (Sunnyvale, United States)
5 (San Francisco, United States)