熊猫错误地写到新的专栏

时间:2018-08-22 15:19:53

标签: python python-3.x pandas

我有两个看起来像这样的数据框:

df_regex_test = pd.DataFrame(columns=['file_names', 'searched_for_found', 'everything'])
df_regex_test_temp = pd.DataFrame(columns=['file_names', 'searched_for_found', 'everything'])

它们都是空的数据帧,例如:

Empty DataFrame
Columns: [file_names, searched_for_found, everything]
Index: []

我有第三个数据框,其中包含实际数据:

df_all_xml_mfiles = pd.merge(df_all_xml_data, files_only, left_on="file_names", right_on="file_names", how="inner")

df_all_xml_mfiles_tgther = df_all_xml_mfiles.groupby(['file_names', 'searched_for_found'])['everything'].apply(' '.join).reset_index()

我正在执行以下操作:

for cc in range(0, len(file_names_only), 1):
     for bb in range(0, len(search_content_array), 1):
          regex_stuff =  df_all_xml_mfiles_tgther[cc:cc+1].everything.str.findall('(<[^<]*?' + search_content_array[bb] + '[^>]*?>)', re.IGNORECASE)


    if not regex_stuff.empty:
        print('\n')
        df_regex_test_temp = df_regex_test_temp.append(regex_stuff, ignore_index=True, sort=True)
        print(df_regex_test_temp.head(5))
        df_regex_test_temp['searched_for_found'] = search_content_array[bb]
        df_regex_test_temp['file_names'] = file_names_only[cc]



        df_regex_test =  df_regex_test.append(df_regex_test_temp, ignore_index=True, sort=False)




        df_regex_test_temp = df_regex_test_temp.iloc[0:0]

    if regex_stuff.empty:

        df_regex_test_temp = df_regex_test_temp.iloc[0:0]

我正在这样输出文件:

text_regex_test= df_regex_test.to_csv('C:\\somewhere\\regex_test.txt', sep='\t')

当我查看输出文件时,会看到以下内容:

              file_names       searched_for_found           everything     0     1
0      example_file.dtsx                    chair                          I like chairs. Chairs are nice.
1      example_file.dtsx                     desk                          I like desks. Desks are awesome.     
2    example_file_2.dtsx                    chair                      Chairs are lame.
3    example_file_2.dtsx                     desk                      Desks are more fun than chairs. 

熊猫创建了“ 0”和“ 1”列,但我希望所有内容都位于“所有”列中。

我做错了什么?我以为可能与列未正确对齐有关,但据我所知并非如此。

这是我期望的输出:

                  file_names       searched_for_found             everything 
    0      example_file.dtsx                    chair     I like chairs. Chairs are nice.
    1      example_file.dtsx                     desk     I like desks. Desks are awesome.     
    2    example_file_2.dtsx                    chair     Chairs are lame.
    3    example_file_2.dtsx                     desk     Desks are more fun than chairs. 

编辑#1:

如果我要对此行进行注释:

df_all_xml_mfiles_tgther[cc:cc+1].everything.str.findall('(<[^<]*?' + search_content_array[bb] + '[^>]*?>)', re.IGNORECASE)

我没有那个问题。这与这条线有关。列是相同的,所以我不确定为什么会导致该问题。

编辑#2:

如果我不注释上面的行,但是注释掉下面的行,那么我也没有这个问题。这条线的东西。 。

df_regex_test_temp = df_regex_test_temp.append(regex_stuff, ignore_index=True, sort=True)

0 个答案:

没有答案