Question

我有一个数据框（UITabBarController），其中列df有很多行，并且有行带有公用字符串（col1）并以不同的数字结尾（{{1 }}。我想提取两个字符串（从Collection of numbers are到001, 002, 005）之间的行，并将它们分配给具有相同行名（Collection of numbers are 002）的新列

Collection of numbers are 003

我要将上面的数据框转换为以下格式。

Collection of numbers are 002

注意：没有重复的数字

Answer 1

我们可以尝试ffill并使用str.split进行一些基本的重设

df['headers'] = df['col1'].str.extract('(Collection.*)').ffill()


df1 = df[~df['col1'].str.contains('Collection')].copy()


df1.groupby('headers').agg(','.join)['col1'].str.split(',',expand=True).T.rename_axis('',axis='columns')

退出：

  Collection of numbers are 002 Collection of numbers are 003  \
0                            53                           236   
1                            20                           325   
2                            56                          None   

  Collection of numbers are 005  
0                            96  
1                            23  
2                            63

Answer 2

您可以使用set_index和unstack。我窃取了@Datanovice提取将来的列的名称的想法，并使用groupby.cumcount获得了将来的索引号：

arrCollection = df['col1'].str.extract('(Collection.*)').ffill()[0].to_numpy()
df_f = df.set_index([df.groupby(arrCollection)['col1'].cumcount()-1,
                     arrCollection])['col1']\
         .unstack().iloc[1:,:]

print (df_f)
  Collection 002 Collection 003 Collection 005
0             53            236             96
1             20            325             23
2             56            NaN             63

注意：列名将与您的示例类似，我没有使用完全相同的输入

Answer 3

在

                    col1
0   c of numbers are 002
1                      1
2                      2
3                      3
4   c of numbers are 003
5                     55
6                     66
7   c of numbers are 005
8                     45
9                     23
10                    12
11                   456
12                    56

for_concat = []
col = []
for i,r in df.iterrows():
    if "numbers" in str(r["col1"]):
        if col:
            for_concat.append(pd.DataFrame(col,columns=[col_name]))
            col_name = r["col1"]
            col = []
        else:
            col_name = r["col1"]
    else:
        col.append(r["col1"])
for_concat.append(pd.DataFrame(col,columns=[col_name]))
out = pd.concat(for_concat, axis =1)

退出：

   c of numbers are 002  c of numbers are 003  c of numbers are 005
0                   1.0                  55.0                    45
1                   2.0                  66.0                    23
2                   3.0                   NaN                    12
3                   NaN                   NaN                   456
4                   NaN                   NaN                    56

Answer 4

Datanovic提供的答案似乎不错。另一种解决方案是使用以下功能：

Wrong format

因此，使用示例数据框，您在调用函数def extract_columns(df, column, common_string): df_list = df[column].tolist() df_new = pd.DataFrame() row_indices = [] cols = [] for ind, elem in enumerate(df_list): if common_string in str(elem): row_indices.append(ind) cols.append(elem) row_indices.append(len(df_list)) for ind, col in enumerate(cols): df_new[col] = pd.Series(df_list[row_indices[ind]+1:row_indices[ind+1]]) return df_new

时会得到以下结果

extract_columns(df, 'col1', 'Collection of numbers are')

如何将单列数据转换为python数据框中的多列？

4 个答案: