Question

我有一个字符串file.txt的文件，其中第一个单词是类名，其余的是描述，如下所示：

n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias

我想将该文件读入两列df['class']的数据框，其中包含类df['description']以及其余内容。

Answer 1

你可以这样做：

df = pd.read_csv(data, sep='\s{2,}', engine='python', names=['col'])

df['class'] = df['col'].str.split().apply(lambda x: x[0])
# Splitting on first occurence of whitespace
df['description'] = df['col'].str.join('').apply(lambda x: x.split(' ',1)[1])
del(df['col'])

print (df)

       class                                        description
0  n01440764                                 tench, Tinca tinca
1  n01443537                        goldfish, Carassius auratus
2  n01484850  great white shark, white shark, man-eater, man...

将文件读入dataframe，在python中的第一个单词后分割文本

1 个答案: