输入数据框的代码为
<div class="row">
<article>
<div class="image"><img src="https://via.placeholder.com/100x100"></div>
<h2>Header text</h2>
<p>Detail text</p>
</article>
<article>
<div class="image"><img src="https://via.placeholder.com/100x100"></div>
<h2>Header text</h2>
<p>Detail text</p>
</article>
<article>
<div class="image"><img src="https://via.placeholder.com/100x100"></div>
<h2>Header text</h2>
<p>Detail text</p>
</article>
</div>
输入数据框:-
import pandas as pd
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))','Column2': 'XYZ + XXX/YYY'}])
输入列表:-
+---------------------------------+---------------------------------+
| Column1 | Column2 +
+---------------------------------+---------------------------------+
| ((CC ) + (A11/ABC/ZZ) + (!AAA)) | XYZ + XXX/YYY |
+---------------------------------+---------------------------------+
条件:-
list = [AAA,BBB,CCC]
因为!符号,该行变为
'+' should remain as such (similar to AND condition)
'/' means split the data into multiple cells (similar to OR condition)
'!' means replace with other elements in the corresponding list (similar to NOT condition)
请帮助我使用熊猫将单行拆分为多行,如下所示
+------------------------------------+---------------------------------+
| Column1 | Column2 +
+------------------------------------+---------------------------------+
| ((CC ) + (A11/ABC/ZZ) + (BBB/CCC)) | XYZ + XXX/YYY |
+------------------------------------+---------------------------------+
答案 0 :(得分:0)
查看这是否满足您的要求。这些注释说明了它的工作原理。
#!/usr/bin/env python
import pandas as pd # tested with pd.__version__ 0.19.2
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))',
'Column2': 'XYZ + XXX/YYY'}]) # your input dataframe
list = ['AAA', 'BBB', 'CCC'] # your input list
to_replace = dict()
for item in list: # prepare the dictionary for the '!' replacements
to_replace["!"+item+'\\b'] = '/'.join([i for i in list if i != item])
df = df.replace(to_replace, regex=True) # do all the '!' replacements
import re
def expanded(s): # expand series s to multiple string list around '/'
l = s.str.replace('[()]', '').tolist()
while True: # in each loop cycle, handle one A/B/C... expression
xl = [] # expanded list for this cycle
for s in l: # for each string in the list so far
m = re.search(r'\w+(/\w+)+', s) # look for a A/B/C... expression
if m: # if there is, add the individual expansions to the list
xl.extend([m.string[:m.start()]+i+m.string[m.end():]
for i in m.group().split('/')])
else: # if not, we're done
return l
l = xl # expanded list for this cycle is now the current list
def expand(c): # expands the column named c to multiple rows
new = expanded(df[c]) # get the new contents
xdf = pd.concat(len(new)/len(df[c])*[df]) # create required rows
xdf[c] = sorted(new) # set the new contents
return xdf # return new dataframe
df = expand('Column1')
df = expand('Column2')
print df
输出:
Column1 Column2 0 CC + A11 + BBB XYZ + XXX 0 CC + A11 + CCC XYZ + XXX 0 CC + ABC + BBB XYZ + XXX 0 CC + ABC + CCC XYZ + XXX 0 CC + ZZ + BBB XYZ + XXX 0 CC + ZZ + CCC XYZ + XXX 0 CC + A11 + BBB XYZ + YYY 0 CC + A11 + CCC XYZ + YYY 0 CC + ABC + BBB XYZ + YYY 0 CC + ABC + CCC XYZ + YYY 0 CC + ZZ + BBB XYZ + YYY 0 CC + ZZ + CCC XYZ + YYY