将文件读入dataframe,在python中的第一个单词后分割文本

时间:2016-09-03 10:13:12

标签: python pandas dataframe

我有一个字符串file.txt的文件,其中第一个单词是类名,其余的是描述,如下所示:

n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias

我想将该文件读入两列df['class']的数据框,其中包含类df['description']以及其余内容。

1 个答案:

答案 0 :(得分:1)

你可以这样做:

df = pd.read_csv(data, sep='\s{2,}', engine='python', names=['col'])

df['class'] = df['col'].str.split().apply(lambda x: x[0])
# Splitting on first occurence of whitespace
df['description'] = df['col'].str.join('').apply(lambda x: x.split(' ',1)[1])
del(df['col'])

print (df)

       class                                        description
0  n01440764                                 tench, Tinca tinca
1  n01443537                        goldfish, Carassius auratus
2  n01484850  great white shark, white shark, man-eater, man...