我在DataFrame中有一列,其中包含一个字符串,我必须使用不同的分隔符从中检索两条信息:
ID STR
280 11040402-38.58551%;11050101-9.29086%;11070101-52.12363%
351 11130203-35%;11130230-65%
510 11070103-69%
655 11090103-41.63463%;11160102-58.36537%
666 11130205-50.00%;11130207-50%
我一直在尝试将本系列的.apply
方法与lambda函数一起使用,一口气进行拆分,无济于事:
df['STR'].apply(lambda x: y.split('-') for y in x.split(';'))
理想情况下,不仅可以一次性拆分字符串,还可以将-
的左侧与右侧分开:
ID STR.LEFT STR.RIGHT
280 [11040402, 11050101, 11070101] [38.58551%, 9.29086%, 52.12363%]
351 [11130203, 11130230] [35%, 65%]
510 [11070103] [69%]
655 [11090103, 11160102] [41.63463%, 58.36537%]
666 [11130205, 11130207] [50.00%, 50%]
我相信这可以通过.apply
和切片来实现,但是任何其他解决方案都欢迎。
答案 0 :(得分:5)
您可以尝试拆分几次:
# set ID as index
df.set_index('ID', inplace=True)
new_series = df.STR.str.split(';', expand=True).stack().reset_index(level=-1,drop=True)
new_df = new_series.str.split('-', expand=True)
new_df.groupby('ID').agg(list).reset_index()
输出:
ID 0 1
-- ---- ------------------------------------ --------------------------------------
0 280 ['11040402', '11050101', '11070101'] ['38.58551%', '9.29086%', '52.12363%']
1 351 ['11130203', '11130230'] ['35%', '65%']
2 510 ['11070103'] ['69%']
3 655 ['11090103', '11160102'] ['41.63463%', '58.36537%']
4 666 ['11130205', '11130207'] ['50.00%', '50%']
答案 1 :(得分:4)
var geojesonlayer1 = new google.maps.Data();
var geojesonlayer2 = new google.maps.Data();
geojesonlayer1.loadGeoJson('mygeojson path');
geojesonlayer2.loadGeoJson('mygeojson path');
//layer style
geojesonlayer1.setStyle({
strokeColor: 'yellow',
strokeWeight: 5
});
geojesonlayer2.setStyle({
strokeColor: 'blue',
strokeWeight: 1
});
//add layer to map
geojesonlayer1.setMap(map);
geojesonlayer2.setMap(map);
//remove layer from map
geojesonlayer1.setMap(null);
geojesonlayer2.setMap(null);
假设模式始终离开str.split
'l-r;l-r;l-r...'
如果要将这些列表分解成单独的行
s = df.STR.str.split('-|;')
df[['ID']].join(pd.concat({'STR.LEFT': s.str[::2], 'STR.RIGTH': s.str[1::2]}, axis=1))
ID STR.LEFT STR.RIGTH
0 280 [11040402, 11050101, 11070101] [38.58551%, 9.29086%, 52.12363%]
1 351 [11130203, 11130230] [35%, 65%]
2 510 [11070103] [69%]
3 655 [11090103, 11160102] [41.63463%, 58.36537%]
4 666 [11130205, 11130207] [50.00%, 50%]
答案 2 :(得分:3)
单个str.extractall
调用就足以将对提取到单独的列中。然后,您可以使用groupby
将它们聚合到列表中。
(df['STR'].str.extractall(r'(.*?)-(.*?)(?=;|$)')
.groupby(level=0)
.agg(list)
.set_axis(['STR.LEFT', 'STR.RIGHT'], axis=1, inplace=False))
STR.LEFT STR.RIGHT
0 [11040402, ;11050101, ;11070101] [38.58551%, 9.29086%, 52.12363%]
1 [11130203, ;11130230] [35%, 65%]
2 [11070103] [69%]
3 [11090103, ;11160102] [41.63463%, 58.36537%]
4 [11130205, ;11130207] [50.00%, 50%]
要加入ID,只需使用join
。
(df['STR'].str.extractall(r'(.*?)-(.*?)(?=;|$)')
.groupby(level=0)
.agg(list)
.set_axis(['STR.LEFT', 'STR.RIGHT'], axis=1, inplace=False)
.join(df['ID'])
STR.LEFT STR.RIGHT ID
0 [11040402, ;11050101, ;11070101] [38.58551%, 9.29086%, 52.12363%] 280
1 [11130203, ;11130230] [35%, 65%] 351
2 [11070103] [69%] 510
3 [11090103, ;11160102] [41.63463%, 58.36537%] 655
4 [11130205, ;11130207] [50.00%, 50%] 666