Question

我想重塑我的数据框：

来自Input_DF的

col1                                                 col2  col3
Course_66    0\nCourse_67    1\nCourse_68    0       a     c  
Course_66    1\nCourse_67    0\nCourse_68    0       a     d

到Output_DF

   Course_66       Course_67       Course_68    col2  col3
           0              0                1     a     c  
           0              1                0     a     d

请注意，col1包含一个长字符串。

请，任何帮助将非常感谢。提前谢谢了。最好的祝福，卡罗

Answer 1

使用：

#first split by whitespaces to df
df1 = df['col1'].str.split(expand=True)
#for each column split by \n and select first value 
df2 = df1.apply(lambda x: x.str.split(r'\\n').str[0])
#for columns select only first row and select second splitted value
df2.columns = df1.iloc[0].str.split(r'\\n').str[1]
print (df2)
0 Course_66 Course_67 Course_68
0         0         0         1
1         0         1         0

#join to original, remove unnecessary column
df = df2.join(df.drop('col1', axis=1))
print (df)
  Course_66 Course_67 Course_68 col2 col3
0         0         0         1    a    c
1         0         1         0    a    d

list理解的另一种解决方案：

L = [[y.split('\\n')[0] for y in x.split()] for x in df['col1']]
cols = [x.split('\\n')[1] for x in df.loc[0, 'col1'].split()]
df1 = pd.DataFrame(L, index=df.index, columns=cols)
print (df1)
  Course_66 Course_67 Course_68
0         0         0         1
1         0         1         0

编辑：

#split values by whitespaces - it split by \n too
df1 = df['course_vector'].str.split(expand=True)
#select each pair columns
df2 = df1.iloc[:, 1::2]
#for columns select each unpair value in first row
df2.columns = df1.iloc[0, 0::2]
#join to original
df = df2.join(df.drop('course_vector', axis=1))

Answer 2

由于您的数据按值，密钥对排序，您可以使用正则表达式分隔换行符和多个空格以获取列表，然后从值的第一个位置开始获取每个其他值，为标签返回第二个位置并返回一个$file_local = Storage::disk('local')->get('file.pdf'); $file_ftp = Storage::disk('ftp')->put('file.pdf', $file_local);个对象。通过应用，您将从这些多个系列中获取一个DataFrame，然后您可以将其与原始Series结合使用。

DataFrame

<强>输出：

import pandas as pd                                                                                                                                                                                                                       

df = pd.DataFrame({'col1': ['0\nCourse_66    0\nCourse_67    1\nCourse_68',                                                                                                                                                               
                            '0\nCourse_66    1\nCourse_67    0\nCourse_68'],                                                                                                                                                              
                'col2': ['a', 'a'], 'col3': ['c', 'd']})                                                                                                                                                                                  

def to_multiple_columns(str_list):                                                                                                                                                                                                        
    # take the numeric values for each series and column labels and return as a series                                                                                                                                                    
    # by taking every other value                                                                                                                                                                                                         
    return pd.Series(str_list[::2], str_list[1::2])                                                                                                                                                                                       

# split on newlines and spaces                                                                                                                                                                                                                
splits = df['col1'].str.split(r'\n|\s+').apply(to_multiple_columns)                                                                                                                                                                       

output = pd.concat([splits, df.drop('col1', axis=1)], axis=1)                                                                                                                                                                             
print(output)

如何将pandas dataframe列的值设为列

2 个答案: