基于pandas中的谓词合并数据帧的列

时间:2017-07-07 11:35:36

标签: python pandas dataframe merge

我通过打开csv文件并清理它获得的数据帧。 csv看起来像这样:

"1. Do some research on al-Khorezmi (also al-Khwarizmi), the man from whose name the word ""algorithm"" is derived. In particular, you should learn what the origins of the words ""algorithm"" and ""algebra"" have in common. ",,Understand,Procedural,Understand,Factual,Understand,
3.  Write down driving directions for going from your school to your home with the precision required from an algorithm's description. ,,Apply,,Apply,Factual,Remember,
Write down a recipe for cooking your favorite dish with the precision required by an algorithm.,Factual,Apply,,,Factual,Remember,
"Design an algorithm to find all the common elements in two sorted lists of numbers. For example, for the lists 2, 5, 5, 5 and 2, 2, 3, 5, 5, 7, the output should be 2, 5, 5. What is the maximum number of comparisons your algorithm makes if the lengths of the two given lists are m and n, respectively?",Procedural,Create,,Apply,,,
"a.  Find gcd(31415, 14142) by applying Euclid's algorithm.",Procedural,Apply,,Apply,,,

使用以下代码加载:

df = pd.read_csv('ADA.csv', names=['Questions', 'A1', 'A2', 'B1', 'B2', 'C1','C2'])

这是我到目前为止所做的:

                                          Questions          A1  \
5     1. Do some research on al-Khorezmi (also al-Kh...         NaN   
6     3.  Write down driving directions for going fr...         NaN   
7     Write down a recipe for cooking your favorite ...     Factual   
8     Design an algorithm to find all the common ele...  Procedural   
9     a.  Find gcd(31415, 14142) by applying Euclid'...  Procedural    

                 A2          B1          B2                    C1  \
5        Understand  Procedural  Understand               Factual   
6             Apply         NaN       Apply               Factual   
7             Apply         NaN         NaN               Factual   
8            Create         NaN       Apply                   NaN   
9             Apply         NaN       Apply                   NaN   

                    C2  
5           Understand  
6             Remember  
7             Remember  
8                  NaN  
9                  NaN  

列为['Questions', 'A1', 'A2', 'B1', 'B2', 'C1', 'C2']

现在,我需要做的是将列['A1', 'B1', 'C1']组合成一列Label 1,将列['A2', 'B2', 'C2']组合到基于此谓词的另一列Label 2中:

Label 1 = A1 if A1 has a value else B1 if B1 has a value else C1

对于标签2也是如此:

Label 2 = A2 if A2 has a value else B2 if B2 has a value else C2

对于给定的输入,我想要两个看起来像这样的列:

Label 1        Label 2 
Procedural     Understand
Factual        Apply
Factual        Apply
Procedural     Create
Procedural     Apply

这就是我的尝试:

df['Label 1'] = df['A1'] if df['A1'] else df['B1'] if df['B1'] else df['C1']  

但它引发了这个错误:

  

ValueError:系列的真值是不明确的。使用a.empty,   a.bool(),a.item(),a.any()或a.all()。

如果我能让某人向我推进正确的方向,那就足够了。感谢。

1 个答案:

答案 0 :(得分:1)

这应该有效:

df['Label 1'] = df['A1'] if df['A1'].notnull().all() \
                else df['B1'] if df['B1'].notnull().all() \
                else df['C1']