如何在一定条件下将行与熊猫相乘?
条件只是以Pref.
结尾的名称。
排序顺序不介意。
import pandas as pd
if __name__ == '__main__':
df = pd.DataFrame({"area": ["Aomori Pref.", "Saitama", "GifuPref."],
"x": [30, 40, 55],
"y": ["l", "m", "n"]})
# I want to get:
# area x y
# 0 Aomori 30 l
# 1 Aomori Pref. 30 l
# 2 Saitama 40 m
# 3 Gifu 55 n
# 4 GifuPref. 55 n
```
答案 0 :(得分:3)
以Pref.
结尾的replace
个值,并为b
添加新的新列NaN
,用于不匹配的值{/ 1}}:
df1 = df['area'].str.replace('\s*(Pref.$)','').to_frame('a')
df1['b'] = df['area'].mask(df1['a'] == df['area'])
print (df1)
a b
0 Aomori Aomori Pref.
1 Saitama NaN
2 Gifu GifuPref.
然后按mask
创建Series
,为Series
提供新列名称的名称,最后按stack
删除MultiIndex
的第二级:
s = df1.stack().rename('area').reset_index(level=1, drop=True)
print (s)
0 Aomori
0 Aomori Pref.
1 Saitama
2 Gifu
2 GifuPref.
Name: area, dtype: object
删除orifinal列area
和reset_index
s
,最后为唯一index
添加join
:
df2 = df.drop('area', 1).join(s).reset_index(drop=True)[df.columns]
print (df2)
area x y
0 Aomori 30 l
1 Aomori Pref. 30 l
2 Saitama 40 m
3 Gifu 55 n
4 GifuPref. 55 n
正则表达式\s*(Pref.$)
表示 - \s*
至少为零次,然后匹配()
中的字符串,$
表示字符串结束。
答案 1 :(得分:1)
pattern = "\s?Pref\\.$"
m = df.area.str.contains(pattern, regex=True)
tmp = df.copy()
tmp.loc[m,"area"] = tmp.area.str.replace(pattern, "")
(pd.concat([df, tmp])
.sort_values("area")
.drop_duplicates()
.reset_index(drop=True))
area x y
0 Aomori 30 l
1 Aomori Pref. 30 l
2 Gifu 55 n
3 GifuPref. 55 n
4 Saitama 40 m