我有一个如下所示的数据框(经过大量预处理后获得)
请找到数据框
d = {'token': {361: '180816_031', 119: '180816_031', 101: '180816_031', 135: '180816_031', 292: '180816_031',
133: '180816_031', 99: '180816_031', 270: '180816_031', 19: '180816_031', 382: '180816_031',
414: '180816_031', 267: '180816_031', 218: '180816_031', 398: '180816_031', 287: '180816_031',
155: '180816_031', 392: '180816_031', 265: '180816_031', 239: '180816_031', 237: '180816_031'},
'station': {361: 'deneb', 119: 'callisto', 101: 'callisto', 135: 'callisto', 292: 'callisto', 133: 'deneb',
99: 'callisto', 270: 'callisto', 19: 'deneb', 382: 'callisto', 414: 'deneb', 267: 'callisto',
218: 'deneb', 398: 'callisto', 287: 'deneb', 155: 'deneb', 392: 'deneb', 265: 'callisto',
239: 'callisto', 237: 'callisto'},
'cycle_number': {361: 'cycle09', 119: 'cycle06', 101: 'cycle04', 135: 'cycle01', 292: 'cycle04', 133: 'cycle05',
99: 'cycle06', 270: 'cycle07', 19: 'cycle04', 382: 'cycle08', 414: 'cycle04', 267: 'cycle10',
218: 'cycle07', 398: 'cycle08', 287: 'cycle09', 155: 'cycle08', 392: 'cycle06', 265: 'cycle02',
239: 'cycle09', 237: 'cycle07'},
'variable': {361: 'adj_high_quality_reads', 119: 'short_pass', 101: 'short_pass', 135: 'cell_mask_bilayers_sum',
292: 'adj_active_polymerase', 133: 'cell_mask_bilayers_sum', 99: 'short_pass',
270: 'adj_active_polymerase', 19: 'Unnamed: 0', 382: 'adj_high_quality_reads',
414: 'num_align_high_quality_reads', 267: 'adj_active_polymerase', 218: 'adj_single_pores',
398: 'num_align_high_quality_reads', 287: 'adj_active_polymerase', 155: 'cell_mask_bilayers_sum',
392: 'num_align_high_quality_reads', 265: 'adj_active_polymerase', 239: 'adj_single_pores',
237: 'adj_single_pores'},
'value': {361: 99704.0, 119: 2072785.0, 101: 2061059.0, 135: 1682208.0, 292: 675306.0, 133: 1714292.0,
99: 2072785.0, 270: 687988.0, 19: 19.0, 382: np.nan, 414: 285176.0, 267: 86914.0, 218: 948971.0,
398: 405196.0, 287: 137926.0, 155: 1830032.0, 392: 480081.0, 265: 951689.0, 239: 681452.0,
237: 882671.0}}
数据:
token station cycle_number variable \
19 180816_031 deneb cycle04 Unnamed: 0
99 180816_031 callisto cycle06 short_pass
101 180816_031 callisto cycle04 short_pass
119 180816_031 callisto cycle06 short_pass
133 180816_031 deneb cycle05 cell_mask_bilayers_sum
135 180816_031 callisto cycle01 cell_mask_bilayers_sum
155 180816_031 deneb cycle08 cell_mask_bilayers_sum
218 180816_031 deneb cycle07 adj_single_pores
237 180816_031 callisto cycle07 adj_single_pores
239 180816_031 callisto cycle09 adj_single_pores
265 180816_031 callisto cycle02 adj_active_polymerase
267 180816_031 callisto cycle10 adj_active_polymerase
270 180816_031 callisto cycle07 adj_active_polymerase
287 180816_031 deneb cycle09 adj_active_polymerase
292 180816_031 callisto cycle04 adj_active_polymerase
361 180816_031 deneb cycle09 adj_high_quality_reads
382 180816_031 callisto cycle08 adj_high_quality_reads
392 180816_031 deneb cycle06 num_align_high_quality_reads
398 180816_031 callisto cycle08 num_align_high_quality_reads
414 180816_031 deneb cycle04 num_align_high_quality_reads
value
19 19.0
99 2072785.0
101 2061059.0
119 2072785.0
133 1714292.0
135 1682208.0
155 1830032.0
218 948971.0
237 882671.0
239 681452.0
265 951689.0
267 86914.0
270 687988.0
287 137926.0
292 675306.0
361 99704.0
382 NaN
392 480081.0
398 405196.0
414 285176.0
我正在尝试使用平滑线创建散点图(下面的预期输出)
fig,ax = plt.subplots()
fig.set_size_inches(16,4)
#to get different colors for each of the `variable` value assign the variable to hue
g2=sns.lmplot(x='cycle_number',y='value',data=df, hue='variable', size=4, aspect=5)
这段代码只为散点图提供一个值,但是我的预期输出如下所示
预期输出:
尝试结果
尝试1
我试图创建条形图(在一些帮助下)并且我成功了,但是使用散点图我做不到
下面的代码将其转换为bar
df1 = df.groupby(['token','variable']).agg({'value': 'mean'})
df1.reset_index(inplace=True)
df1.sort_values('value',inplace=True,ascending=False)
fig,ax = plt.subplots()
fig.set_size_inches(16,8)
#to get different colors for each of the variable assign the variable to hue
g=sns.barplot(x='token',y='value',data=df1, hue='variable',ax=ax)
#Code for to put legend outside the plot
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
# Adding respective values to the top of each bar
for p in ax.patches:
ax.annotate("%d" % p.get_height(), (p.get_x() + p.get_width() / 2, p.get_height()),
ha='center', va='center', fontsize=11, color='black', xytext=(0, 10),
textcoords='offset points',fontweight='bold')
plt.show()
尝试2
g2=sns.lmplot(x='cycle_number',y='value',data=df), this gives error
ValueError: could not convert string to float: 'cycle10'
我知道错误在这里意味着什么,但是我很难尝试复制到输出代码
尝试3:
sns.lmplot('cycle_number', 'value', data=df, hue='variable', fit_reg=False)
已生成输出:空白网格
答案 0 :(得分:2)
使用:
sns.pointplot('cycle_number', 'value', data=df, hue='variable')
注释: https://seaborn.pydata.org/generated/seaborn.pointplot.html
使用此输出与预期的输出生成
尝试一下:
df = pd.DataFrame(d)
df['cycle_number'] = df['cycle_number'].str.replace('cycle', '')
df['cycle_number'] = df['cycle_number'].apply(pd.to_numeric)
print(df)
fig, ax = plt.subplots()
fig.set_size_inches(16, 4)
# sns.pointplot('cycle_number', 'value', data=df, hue='variable', err_style="bars", ci=68)
sns.lmplot('cycle_number', 'value', data=df, hue='variable', ci=None, order=2, truncate=True)
# use order = 5 to see more curve
order=2
的输出
根据最新共享的代码输出(order=2
)
除了图例与绘图区域重叠之外,图形曲线非常好。