如果其他人对标题有更好的了解,我不确定那是否是最好的标题,我会提出建议。
假设我有一个看起来像这样的数据框:
df2
A section
0 <fruit>
1 apple
2 orange
3 pear
4 watermelon
5 </fruit>
6 <furniture>
7 chair
8 sofa
9 table
10 desk
11 </furniture>
我想要的是一个看起来像这样的数据框:
A section
0 <fruit> fruit
1 apple fruit
2 orange fruit
3 pear fruit
4 watermelon fruit
5 </fruit> fruit
6 <furniture> furniture
7 chair furniture
8 sofa furniture
9 table furniture
10 desk furniture
11 </furniture> furniture
有没有办法做到这一点?我曾考虑过要使用if语句逐行进行操作,但是这样做时我遇到了布尔逻辑的问题。
编辑#1:
下面发布的此解决方案可以解决我的问题。
解决方案:
df['section']=pd.Series(np.where(df.A.str.contains('<'),df.A.str.replace('<|>|/',''),np.nan)).ffill()
如果我有看起来像这样的数据怎么办?我想要相同的结果。
A section
0 <fruit>
1 <fruit_1>apple</fruit_1>
2 <fruit_2>orange</fruit_2>
3 <fruit_3>pear</fruit_3>
4 <fruit_4>watermelon</fruit_4>
5 </fruit>
6 <furniture>
7 <furniture_1>chair</furniture_1>
8 <furniture_2>sofa</furniture_2>
9 <furniture_3>table</furniture_3>
10 <furniture_4>desk</furniture_4>
11 </furniture>
答案 0 :(得分:3)
IIUC使用contains
查找行,并np.where
分配值,然后使用ffill
填充np.nan
df['section']=pd.Series(np.where(df.A.str.contains('<'),df.A.str.replace('<|>|/',''),np.nan)).ffill()
df
Out[1003]:
A section
0 <fruit> fruit
1 apple fruit
2 orange fruit
3 pear fruit
4 watermelon fruit
5 </fruit> fruit
6 <furniture> furniture
7 chair furniture
8 sofa furniture
9 table furniture
10 desk furniture
11 </furniture> furniture
如果您想更精确/更具体/更严格,还可以使用startswith
和endswith
检查字符串的开头和结尾。
df1['Section'] = pd.Series(np.where(df1.A.str.startswith('<') & df1.A.str.endswith('>'), df1.A.str.replace('<|>|/',''), np.nan)).ffill()
答案 1 :(得分:1)
我会选择露骨
import re
def parse_funky_xml(s):
tag = None
for x in s:
if tag is None:
match = re.match('<([^/]+)>', x)
if match:
tag = match.groups()[0]
yield tag
else:
match = re.match(f'</{tag}>', x)
yield tag
if match:
tag = None
df.assign(section=[*parse_funky_xml(df.A)])
A section
0 <fruit> fruit
1 apple fruit
2 orange fruit
3 pear fruit
4 watermelon fruit
5 </fruit> fruit
6 <furniture> furniture
7 chair furniture
8 sofa furniture
9 table furniture
10 desk furniture
11 </furniture> furniture