我有一个字符串file.txt
的文件,其中第一个单词是类名,其余的是描述,如下所示:
n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
我想将该文件读入两列df['class']
的数据框,其中包含类df['description']
以及其余内容。
答案 0 :(得分:1)
你可以这样做:
df = pd.read_csv(data, sep='\s{2,}', engine='python', names=['col'])
df['class'] = df['col'].str.split().apply(lambda x: x[0])
# Splitting on first occurence of whitespace
df['description'] = df['col'].str.join('').apply(lambda x: x.split(' ',1)[1])
del(df['col'])
print (df)
class description
0 n01440764 tench, Tinca tinca
1 n01443537 goldfish, Carassius auratus
2 n01484850 great white shark, white shark, man-eater, man...