是否有办法根据当前值的第一个字符将新值映射到数据框列。
我目前的代码:
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('1'), 'city', ncesvars['urbantype'])
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('2'), 'suburban', ncesvars['urbantype'])
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('3'), 'town', ncesvars['urbantype'])
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('4'), 'rural', ncesvars['urbantype'])
我在考虑使用某种dict
然后使用pd.replace
,但我不确定如何使用.str.startswith()
答案 0 :(得分:3)
ncesvars['urbantype'] = ncesvars['urbantype'].replace({
r'^1.*', 'city',
r'^2.*', 'suburban'},
regex=True)
测试:
In [32]: w
Out[32]:
word
0 1_A_
1 word03
2 word02
3 word00
4 2xxx
5 word04
6 word01
7 word02
8 word04
9 3aaa
In [33]: w['word'].replace({r'^1.*': 'city', r'^2.*': 'suburban', r'^3.*': 'town'}, regex=True)
Out[33]:
0 city
1 word03
2 word02
3 word00
4 suburban
5 word04
6 word01
7 word02
8 word04
9 town
Name: word, dtype: object
答案 1 :(得分:2)
您可以定义类别的字典,使用str[0:1]
切片数据,并通过测试数据的第一个字符是否在您的map
中,在Series
的布尔掩码上调用NaN
dict键,以便只覆盖匹配,否则用In [16]:
df = pd.DataFrame({'urbantype':['1 asdas','2 asd','3 asds','4 asdssd','5 asdas']})
df
Out[16]:
urbantype
0 1 asdas
1 2 asd
2 3 asds
3 4 asdssd
4 5 asdas
In [18]:
d = {'1':'city','2':'suburban', '3': 'town','4':'rural'}
df.loc[df['urbantype'].str[0:1].isin(d.keys()), 'urbantype'] = df['urbantype'].str[0:1].map(d)
df
Out[18]:
urbantype
0 city
1 suburban
2 town
3 rural
4 5 asdas
覆盖,因为以下示例中的最后一行没有映射:
extern crate timer;
extern crate chrono;
use timer::Timer;
use chrono::Duration;
use std::thread;
fn x() {
println!("hello");
}
fn main() {
let timer = Timer::new();
let guard = timer.schedule_repeating(Duration::seconds(2), x);
// give some time so we can see hello printed
// you can execute any code here
thread::sleep(::std::time::Duration::new(10, 0));
// stop repeating
drop(guard);
}