我想重塑我的数据框:
来自Input_DF的
col1 col2 col3
Course_66 0\nCourse_67 1\nCourse_68 0 a c
Course_66 1\nCourse_67 0\nCourse_68 0 a d
到Output_DF
Course_66 Course_67 Course_68 col2 col3
0 0 1 a c
0 1 0 a d
请注意,col1包含一个长字符串。
请,任何帮助将非常感谢。 提前谢谢了。 最好的祝福, 卡罗
答案 0 :(得分:3)
使用:
#first split by whitespaces to df
df1 = df['col1'].str.split(expand=True)
#for each column split by \n and select first value
df2 = df1.apply(lambda x: x.str.split(r'\\n').str[0])
#for columns select only first row and select second splitted value
df2.columns = df1.iloc[0].str.split(r'\\n').str[1]
print (df2)
0 Course_66 Course_67 Course_68
0 0 0 1
1 0 1 0
#join to original, remove unnecessary column
df = df2.join(df.drop('col1', axis=1))
print (df)
Course_66 Course_67 Course_68 col2 col3
0 0 0 1 a c
1 0 1 0 a d
list
理解的另一种解决方案:
L = [[y.split('\\n')[0] for y in x.split()] for x in df['col1']]
cols = [x.split('\\n')[1] for x in df.loc[0, 'col1'].split()]
df1 = pd.DataFrame(L, index=df.index, columns=cols)
print (df1)
Course_66 Course_67 Course_68
0 0 0 1
1 0 1 0
编辑:
#split values by whitespaces - it split by \n too
df1 = df['course_vector'].str.split(expand=True)
#select each pair columns
df2 = df1.iloc[:, 1::2]
#for columns select each unpair value in first row
df2.columns = df1.iloc[0, 0::2]
#join to original
df = df2.join(df.drop('course_vector', axis=1))
答案 1 :(得分:0)
由于您的数据按值,密钥对排序,您可以使用正则表达式分隔换行符和多个空格以获取列表,然后从值的第一个位置开始获取每个其他值,为标签返回第二个位置并返回一个$file_local = Storage::disk('local')->get('file.pdf');
$file_ftp = Storage::disk('ftp')->put('file.pdf', $file_local);
个对象。通过应用,您将从这些多个系列中获取一个DataFrame,然后您可以将其与原始Series
结合使用。
DataFrame
<强>输出:强>
import pandas as pd
df = pd.DataFrame({'col1': ['0\nCourse_66 0\nCourse_67 1\nCourse_68',
'0\nCourse_66 1\nCourse_67 0\nCourse_68'],
'col2': ['a', 'a'], 'col3': ['c', 'd']})
def to_multiple_columns(str_list):
# take the numeric values for each series and column labels and return as a series
# by taking every other value
return pd.Series(str_list[::2], str_list[1::2])
# split on newlines and spaces
splits = df['col1'].str.split(r'\n|\s+').apply(to_multiple_columns)
output = pd.concat([splits, df.drop('col1', axis=1)], axis=1)
print(output)