Question

.txt文件中包含以下数据：

Ord

我想将其读入具有以下结构的pandas数据框中：

LC xx1   
Name y1 y2 y3
A 10 12 13
B 9 11 15
C 7 15 16

LC xy2   
Name y1 y2 y3
A 11 12 19
B 20 37 20
C 40 15 1

有人对如何以编程方式执行此操作有想法吗？我必须在10MB的大文件中完成此操作。

谢谢。

J.A。

Answer 1

您可以使用：

temp=u""" LC xx1   
    Name y1 y2 y3
    A 10 12 13
    B 9 11 15
    C 7 15 16

    LC xy2   
    Name y1 y2 y3
    A 11 12 19
    B 20 37 20
    C 40 15 1"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.txt'

#set names parameter by number of columns
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s+", names=range(4))
print (df)
     0    1    2    3
0    LC  xx1  NaN  NaN
1  Name   y1   y2   y3
2     A   10   12   13
3     B    9   11   15
4     C    7   15   16
5    LC  xy2  NaN  NaN
6  Name   y1   y2   y3
7     A   11   12   19
8     B   20   37   20
9     C   40   15    1

#set columns names by second row
df.columns = df.iloc[1]
#remove index name 1
df.columns.name = None
#get mask by comapring LC
mask = df['Name'] == 'LC'
#create new column by mask with forward filling missing values
df.insert(0, 'LC', df['y1'].where(mask).ffill())
#remove rows with LC and columns names
df = df[~mask & (df['Name'] != 'Name')].reset_index(drop=True)
print (df)
    LC Name  y1  y2  y3
0  xx1    A  10  12  13
1  xx1    B   9  11  15
2  xx1    C   7  15  16
3  xy2    A  11  12  19
4  xy2    B  20  37  20
5  xy2    C  40  15   1

另一个python解决方案：

items = []
cols = []
with open('file.txt') as f: 
    LC = ''
    #loop by each line
    for i, line in enumerate(f):
        #remove traling new line char and split by whitespace
        l = line.rstrip('\n').split() 
        #store columns names
        if (i == 1):
            cols = l
        #store value next LC
        if (len(l) == 2) and (l[0] == 'LC'):
            LC = l[1]
        #store each line, remove empty lists
        elif (len(l) > 2) and (l[0] != 'Name'):
            items.append([LC] + l)
#create DataFrame
df = pd.DataFrame(items, columns=['LC'] + cols)
#if necessary convert columns to integers
df.iloc[:, 2:] = df.iloc[:, 2:].astype(int)
print (df)
    LC Name  y1  y2  y3
0  xx1    A  10  12  13
1  xx1    B   9  11  15
2  xx1    C   7  15  16
3  xy2    A  11  12  19
4  xy2    B  20  37  20
5  xy2    C  40  15   1

熊猫读取数据框，将逐行标题转换为列

1 个答案: