Question

我有.TX0文件（某种csv txt文件）并通过python .readlines（），open（filename，＆＃39; w＆＃39;）等方法将其转换为.txt文件。我有这个新保存的txt文件，但当我尝试将其转换为数据帧时，它只给我一列。 txt文件如下：

Empty DataFrame
Columns: [ '"Software Version:", 6.3.2.0646, Date:, 19/08/2015 09:26:04\n',  '"Reprocess Number:", vma2:  261519, Unnamed: 7, \n',  '"Sample Name:",  , Data Acquisition Time:, 18/08/2015 17:23:23\n',  '"Instrument Name:", natural gas (PE ASXL-TCD/FID), Channel:, B\n',  '"Rack/Vial:", 0, 0.1, Operator:, joey.walker\n',  '"Sample Amount:", 1.000000, Dilution Factor:, 1.000000\n',  '"Cycle:", 1, Result File :, \\\\vma2\\TotalChrom\11170_he_tcd001.rst \n',  '"Sequence File :", \\\\vma\C1_C2_binary.seq \n',  '"===================================================================================================================================="\n',  '""\n',  '""\n'.1,  '"condensate analysis (HP4890 Optic - FID)"\n',  '"Peak", Component, Time, Area, Height, BL\n',  '"#", Name, [min], [uV*sec], [uV], \n'.1,  '------, ------, ------.1, ------.2, ------.3, ------\n',  '1, Unnamed: 55, 0.810, 706.42, 304.38, *BB\n',  '2, CH4, 0.900, 1113518.24, 495918.41, *BB\n'.1,  '3, C2H6, 1.373, 901670.23, 295381.12, *BB\n'.2,  '"", Unnamed: 73, Unnamed: 74, ------.4, ------.5, \n'.2,  '"".1, Unnamed: 79, Unnamed: 80, 2015894.89, 791603.91, \n'.3,  '"Missing Component Report"\n',  '"Component", Expected Retention (Calibration File)\n',  '------.1, ------\n'.1,  '"All components were found"\n',  '"Report stored in ASCII file :", C:\\Shared Folders\\TotalChrom\\11170_he_tcd001.TX0 \n']]
Index: []

更容易阅读：

清空DataFrame

专栏：[＆＃39;＆＃34;软件版本：＆＃34;，6.3.2.0646，日期：，2015年8月19日   09：26：04 \ n＆＃39;，＆＃39;＆＃34;重新编号：＆＃34;，vma2：261519，未命名：7，\ n＆＃39;，   ＆＃39;＆＃34;样品名称：＆＃34; ,,数据采集时间：，18/08/2015 17：23：23 \ n＆＃39;，   ＆＃39;＆＃34;仪器名称：＆＃34;，天然气（PE ASXL-TCD / FID），频道：，B \ n＆＃39;，   ＆＃39;＆＃34;机架/样品瓶：＆＃34;，0,0.1，操作员：，joey.walker \ n＆＃39;，＆＃39;＆＃34;样品量：＆＃34;，   1.000000，稀释因子：，1.000000 \ n＆＃39;，＆＃39;＆＃34;循环：＆＃34;，1，结果文件：，\\ vma2 \ TotalChrom \ data \ Joey \ Binary_Mixtures \ Std1 \ 11170_he_tcd001。 RST   \ n＆＃39;，＆＃39;＆＃34;序列文件：＆＃34;，   \\ vma2 \ TotalChrom \ sequences \ Joey \ C1_C2_binary.seq \ n＆＃39;，   ＆＃39;＆＃34; ========================================== ================================================== ========================================＆＃34; \ n＆＃39 ;, ＆＃39;＆＃34;＆＃34; \ n＆＃39;，＆＃39;＆＃34;＆＃34; \ n＆＃39; .1，＆＃39;＆＃34;凝析油分析（HP4890）光学 - FID）＆＃34; \ n＆＃39;，   ＆＃39;＆＃34;峰值＆＃34;，组件，时间，区域，高度，BL \ n＆＃39;，＆＃39;＆＃34;＃＆＃34;，名称，[分钟]，   [uV * sec]，[uV]，\ n＆＃39; .1，＆＃39; ------，------，------。1，----- -.2，------。3，   ------ \ n＆＃39;，＆＃39; 1，未命名：55,0.810,706.42,304.38，* BB \ n＆＃39;，＆＃39; 2，CH4,0.900,1115318.24,495918.41， * BB \ n＆＃39; .1，＆＃39; 3，C2H6,1.337,901670.23,295381.12，* BB \ n＆＃39; .2，＆＃39;＆＃34;＆＃34;，未命名：73 ，未命名：74，------。4，------。5，\ n＆＃39; .2，＆＃39;＆＃34;＆＃34; .1，未命名：79，未命名：80,2015894.89,791603.91，\ n＆＃39; .3，＆＃39;＆＃34;缺少组件报告＆＃34; \ n＆＃39;，＆＃39;＆＃34;组件＆＃34;，预期保留（校准文件）\ n＆＃39;，＆＃39; ------。1，------ \ n＆＃39; .1，   ＆＃39;＆＃34;找到所有组件＆＃34; \ n＆＃39;，＆＃39;＆＃34;报告存储在ASCII文件中：＆＃34;，   C：\共享   文件夹\的TotalChrom \ DATA \乔伊\ Binary_Mixtures \ STD1 \ 11170_he_tcd001.TX0   \ n＆＃39;]]索引：[]

正如您所看到的，这是以逗号分隔的。有没有办法将此文本转换为逗号分隔的数据框？

感谢。

Ĵ

Answer 1

您可以尝试使用以下功能，它可以帮助您加载本地csv文件中的所有数据

ps.read_csv()

更多详情可在pandas.read_csv tutorial

中找到

Answer 2

您可以尝试下面的代码将文本文件转换为数据框。

data = pd.read_csv('file.txt', sep=',')

希望它能自我解释。

Answer 3

在这里，我对这个问题有一个一般性的答案：

import re
import pandas as pd

#first u have to open  the file and seperate every line like below:

df = open('file.txt', "r")
lines = df.readlines()
df.close()

# remove /n at the end of each line
for index, line in enumerate(lines):
      lines[index] = line.strip()



#creating a dataframe(consider u want to convert your data to 2 columns)

df_result = pd.DataFrame(columns=('first_col', 'second_col'))
i = 0  
first_col = "" 
second_col = ""  
for line in lines:
    #you can use "if" and "replace" in case you had some conditions to manipulate the txt data
    if 'X' in line:
        first_col = line.replace('X', "")
    else:
        #you have to kind of define what are the values in columns,for example second column includes:
        second_col = re.sub(r' \(.*', "", line)
        #this is how you create next line data
        df_result.loc[i] = [first_col, second_col]
        i =i+1

Answer 4

我刚刚找到了一个简单的解决方案，它适用于我的代码。你也可以在你的 cade 中尝试这个：

f = open('glove.6B.100d.txt', encoding='utf8')

将文本文件转换为pandas数据帧

4 个答案: