Question

我正在使用Jupyter Notebook在多个文本文件上运行一些基本的自然语言处理。我正在使用两个.ipynb文件。我称之为“外壳”的文件之一将读取文件。它调用第二个.ipynb（核心程序），该程序运行NLP。

（您可以说，我是一个初学者。我认识到Jupyter Notebook并不适合此操作，但这是我正在使用的当前设置。）

核心文件的结果如下：

return {'Cor':numCor, 'Sub':numSub, 'Ins':numIns, 'Del':numDel}

我有十个txt文件，我想在其上运行核心NLP程序，最后要得到一个带有列的数据帧：1）文件名（从txt文件的名称中提取），2）Cor，3） Sub，4）Ins和5）Del。整数结果将填充行。

我每次运行核心：

z=wer(y,x)

它产生了这个：

{'Cor': 8, 'Sub': 0, 'Ins': 0, 'Del': 52}

但是它以这种形式产生：

    0
Cor 8
Sub 0
Ins 0
Del 52

我需要尝试移调它，所以我这样做了：

df2=pd.Series(z).to_frame()
df2.reset_index()
df = df2.T

哪个产生这个：

    Cor Sub Ins Del
0   8   0   0   52

到目前为止（我认为）很好。我想使用这种命令将结果附加到循环中，在其中为10个文本文件的每一个添加一行：

 orf += [{'Cor': df.Cor, 'Sub': df.Sub, 'Ins': df.Ins}]

'orf'正在从数据帧捕获，我认为这是我的问题的一部分。这是前两个文本文件的结果-当它从数据帧追加时，它还会获取元数据（不确定术语是否正确），例如数据类型：

[{'filename': '/Users/jeannehsinclair/COVFEFE/miscues_ORF/anton/716_Anton_test.txt',
  'Cor': 0    52
  Name: Cor, dtype: int64,
  'Sub': 0    3
  Name: Sub, dtype: int64,
  'Ins': 0    0
  Name: Ins, dtype: int64,
  'Del': 0    5
  Name: Del, dtype: int64},
 {'filename': '/Users/jeannehsinclair/COVFEFE/miscues_ORF/anton/936_Anton.txt',
  'Cor': 0    60
  Name: Cor, dtype: int64,
  'Sub': 0    0
  Name: Sub, dtype: int64,
  'Ins': 0    0
  Name: Ins, dtype: int64,
  'Del': 0    0
  Name: Del, dtype: int64},

我想将其转换回数据框。问题是，当我转换为数据框时，会得到这个（这里仅包括3个变量，以便于格式化）：

    Cor                             Ins                         Sub
0   0 52 Name: Cor, dtype: int64    0 0 Name: Ins, dtype: int64 0 3 Name: Sub, dtype: int64
1   0 60 Name: Cor, dtype: int64    0 0 Name: Ins, dtype: int64 0 0 Name: Sub, dtype: int64
2   0 60 Name: Cor, dtype: int64    0 0 Name: Ins, dtype: int64 0 0 Name: Sub, dtype: int64
3   0 59 Name: Cor, dtype: int64    0 0 Name: Ins, dtype: int64 0 1 Name: Sub, dtype: int64
4   0 60 Name: Cor, dtype: int64    0 0 Name: Ins, dtype: int64 0 0 Name: Sub, dtype: int64
5   0 59 Name: Cor, dtype: int64    0 0 Name: Ins, dtype: int64 0 0 Name: Sub, dtype: int64

我不希望在那里打印所有的字符串。我只想要每个单元格中的第二个整数。例如，对于第一行，我只希望每个单元格具有52、5、0、3。

我在简化附加流程方面寻求帮助。我想有一个很好的方法可以做到这一点，而无需两次转换为数据帧。

最终我需要一个看起来像这样的数据框

    Cor Sub Ins Del Filename
1   8   0   1   52  File1
2   6   0   0   52  File2
3   2   2   1   52  File3
4   1   3   0   52  File4

在此先感谢您提供的任何建议！

Python-将系列（或列表）转换为数据框，只需要值

0 个答案: