熊猫数据框使用某种条件将一列数据拆分为2

时间:2020-10-08 06:50:16

标签: python pandas dataframe

我有一个数据框低于-

             0  
    ____________________________________
0     Country| India  
60        Delhi  
62       Mumbai  
68       Chennai  
75    Country| Italy  
78        Rome  
80       Venice  
85        Milan  
88    Country| Australia  
100      Sydney  
103      Melbourne  
107      Perth  

我想将数据分为两列,以便在一列中有国家,而另一列中有城市。我不知道从哪里开始。我想像下面-

             0                    1
    ____________________________________
0     Country| India           Delhi
1     Country| India           Mumbai
2     Country| India           Chennai         
3    Country| Italy           Rome
4    Country| Italy           Venice   
5    Country| Italy           Milan        
6    Country| Australia       Sydney
7   Country| Australia       Melbourne
8   Country| Australia       Perth     

有什么想法吗?

2 个答案:

答案 0 :(得分:3)

查找存在|的行,并拖到另一列中,并填写新创建的列:

(
    df.rename(columns={"0": "city"})
    # this looks for rows that contain '|' and puts them into a 
    # new column called Country. rows that do not match will be
    # null in the new column.
    .assign(Country=lambda x: x.loc[x.city.str.contains("\|"), "city"])
    # fill down on the Country column, this also has the benefit
    # of linking the Country with the City, 
    .ffill()
    # here we get rid of duplicate Country entries in city and Country
    # this ensures that only Country entries are in the Country column
    # and cities are in the City column
    .query("city != Country")
    # here we reverse the column positions to match your expected output 
    .iloc[:, ::-1]
)


      Country           city
60  Country| India      Delhi
62  Country| India      Mumbai
68  Country| India      Chennai
78  Country| Italy      Rome
80  Country| Italy      Venice
85  Country| Italy      Milan
100 Country| Australia  Sydney
103 Country| Australia  Melbourne
107 Country| Australia  Perth

答案 1 :(得分:2)

DataFrame.insertSeries.whereSeries.str.startswith用于用ffill将不匹配的值替换为丢失的值,以向前填充丢失的值,然后通过Series.ne等于boolean indexing中的不等于:

df.insert(0, 'country', df[0].where(df[0].str.startswith('Country')).ffill())
df = df[df['country'].ne(df[0])].reset_index(drop=True).rename(columns={0:'city'})
print (df)
             country       city
0      Country|India      Delhi
1      Country|India     Mumbai
2      Country|India    Chennai
3      Country|Italy       Rome
4      Country|Italy     Venice
5      Country|Italy      Milan
6  Country|Australia     Sydney
7  Country|Australia  Melbourne
8  Country|Australia      Perth