PANDAS用逗号分隔符拆分一列,但列拆分数未知

时间:2020-01-23 19:21:18

标签: python pandas

我有一个名为“天气”的列,我想分成多列

(degrees, humidity, wind_mph, wind_chill)

它看起来像这样:

有时它会很潮湿,有时会出现风寒,有时甚至不会有一个。

'81 degrees, wind 8 mph' .  
'40 degrees, relative humidity 75%, wind 17 mph' .   
'52 degrees, wind 12 mph'   
'51 degrees, relative humidity 82%, wind 6 mph, wind chill 0'

我要拆分,以便在NULL处拆分时不会出现风寒或湿气。

我该怎么做?

2 个答案:

答案 0 :(得分:0)

这应该为您工作。基本上,您可以使用str.extract提取所需的列。

import pandas as pd
weather = ['81 degrees, wind 8 mph',   '40 degrees, relative humidity 75%, wind 17 mph','52 degrees, wind 12 mph', '51 degrees, relative humidity 82%, wind 6 mph, wind chill 0']
df = pd.DataFrame(weather, columns = ['weather'])
df.head()
df['degrees'] = df.weather.str.extract(r'(\d+)\s*degrees',expand = True)
df['humidity'] = df.weather.str.extract(r'humidity\s*(\d+)%',expand = True)
df['wind_mph'] = df.weather.str.extract(r'wind\s*(\d+)\s*mph',expand = True)
df['wind_chill'] = df.weather.str.extract(r'wind\s*chill\s*(\d+)',expand = True)

答案 1 :(得分:0)

pd.concat(
    [df,
    df['ColName'].str.extract(r'(?P<degrees>.*degrees).*(?P<wind_mph>wind.*mph)', expand = True),
    df['ColName'].str.extract(r', (?P<humidity>.*humidity.*%)'),
    df['ColName'].str.extract(r'.*(?P<wind_chill>wind chill .*)'),
    ], 
    axis = 1)

您可以使用正则表达式进行一系列提取,并将它们重新组合回原始df。将'ColName'替换为实际列的名称。