请忽略大量的列,复制和粘贴我当前的示例要容易得多。
眼前的问题:下面的四列结合在一起,是我一行的唯一标识符。这些列是 param01,param02,param03,param04 。我希望能够观察到所有其他列如何随 param04 的变化,同时选择了 param01,param02,param03 的唯一组合。也就是说,如果param01,param02,param03
的组合对应于param04
的多个条目,我想保留该结果。
理想情况下,在结果结尾处,我希望将{em> param01,param02,param03 的独特组合简化为table
/ datafram
到一个 param04 的条目。最后,针对其他参数的特定组合,根据更改 param04 的功能来绘制其他任何列。
我正在寻找有关如何在 pandas 或SQL
ish
<table><tbody><tr><th><100>_poisson </th><th>avg wall time (s) </th><th>bulk_hill </th><th>c_{11} </th><th>c_{12} </th><th>c_{44} </th><th>homo_poisson </th><th>param01 </th><th>param02 </th><th>param03 </th><th>param04 </th><th>shear_hill </th><th>time_generated </th><th>young_hill</th><th> </th></tr><tr><td>0 </td><td>0.264 </td><td>0 </td><td>91.6 </td><td>160.0 </td><td>57.4 </td><td>75.8 </td><td>0.214 </td><td>50.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.8 </td><td>2019-02-14 11:11:39.254305 </td><td>157.3</td></tr><tr><td>1 </td><td>0.268 </td><td>0 </td><td>89.5 </td><td>154.9 </td><td>56.8 </td><td>76.8 </td><td>0.211 </td><td>70.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.2 </td><td>2019-02-14 11:11:43.696335 </td><td>155.4</td></tr><tr><td>2 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.8 </td><td>0.210 </td><td>90.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.2 </td><td>2019-02-14 11:11:47.814102 </td><td>155.3</td></tr><tr><td>3 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:11:52.052636 </td><td>155.2</td></tr><tr><td>4 </td><td>0.268 </td><td>0 </td><td>89.5 </td><td>154.9 </td><td>56.8 </td><td>76.8 </td><td>0.211 </td><td>130.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:11:55.752065 </td><td>155.3</td></tr><tr><td>5 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>150.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:11:59.631407 </td><td>155.2</td></tr><tr><td>6 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>30.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:12:03.275825 </td><td>155.2</td></tr><tr><td>7 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>40.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:12:07.057999 </td><td>155.2</td></tr><tr><td>8 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>60.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:12:11.655756 </td><td>155.2</td></tr><tr><td>9 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.3 </td><td>0.211 </td><td>110.0 </td><td>50.0 </td><td>7.0 </td><td>4.0 </td><td>63.9 </td><td>2019-02-14 11:12:15.474917 </td><td>154.8</td></tr><tr><td>10 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.4 </td><td>0.211 </td><td>110.0 </td><td>50.0 </td><td>9.0 </td><td>4.0 </td><td>63.9 </td><td>2019-02-14 11:12:19.727918 </td><td>154.9</td></tr><tr><td>11 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.9 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>13.0 </td><td>4.0 </td><td>64.2 </td><td>2019-02-14 11:12:24.841238 </td><td>155.3</td></tr><tr><td>12 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>11.0 </td><td>2.0 </td><td>64.1 </td><td>2019-02-14 11:12:29.916590 </td><td>155.2</td></tr><tr><td>13 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>11.0 </td><td>3.0 </td><td>64.1 </td><td>2019-02-14 11:12:35.019309 </td><td>155.2</td></tr><tr><td>14 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>11.0 </td><td>5.0 </td><td>64.1 </td><td>2019-02-14 11:12:39.904661 </td><td>155.2</td></tr><tr><td>15 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>11.0 </td><td>6.0 </td><td>64.1 </td><td>2019-02-14 11:12:44.982282 </td><td>155.2</td></tr><tr><td>16 </td><td>0.017 </td><td>0 </td><td>287.3 </td><td>799.5 </td><td>47.7 </td><td>120.4 </td><td>0.243 </td><td>30.0 </td><td>30.0 </td><td>5.0 </td><td>4.0 </td><td>177.9 </td><td>2019-02-14 11:12:50.124683 </td><td>442.3</td></tr><tr><td>17 </td><td>0.264 </td><td>0 </td><td>91.6 </td><td>159.9 </td><td>57.5 </td><td>76.2 </td><td>0.213 </td><td>40.0 </td><td>30.0 </td><td>5.0 </td><td>4.0 </td><td>65.0 </td><td>2019-02-14 11:12:54.744038 </td><td>157.7</td></tr><tr><td>18 </td><td>0.264 </td><td>0 </td><td>91.7 </td><td>160.1 </td><td>57.5 </td><td>76.2 </td><td>0.213 </td><td>50.0 </td><td>30.0 </td><td>5.0 </td><td>4.0 </td><td>65.0 </td><td>2019-02-14 11:12:58.547615 </td><td>157.8</td></tr><tr><td>19 </td><td>0.268 </td><td>0 </td><td>89.4 </td><td>154.8 </td><td>56.6 </td><td>76.4 </td><td>0.210 </td><td>60.0 </td><td>30.0 </td><td>5.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:13:03.234323 </td><td>155.3</td></tr><tr><td>20 </td><td>4.923 </td><td>0 </td><td>-5.8 </td><td>0.0 </td><td>0.0 </td><td>46.3 </td><td>-1.138 </td><td>30.0 </td><td>10.0 </td><td>5.0 </td><td>4.0 </td><td>208.5 </td><td>2019-02-14 11:13:08.527995 </td><td>-57.4</td></tr><tr><td>21 </td><td>0.015 </td><td>0 </td><td>728.8 </td><td>2305.4 </td><td>96.4 </td><td>75.6 </td><td>0.334 </td><td>30.0 </td><td>20.0 </td><td>5.0 </td><td>4.0 </td><td>272.0 </td><td>2019-02-14 11:13:15.060308 </td><td>725.7</td></tr></tbody></table>
答案 0 :(得分:0)
我希望我能正确理解:
我希望通过参数param01,param02,param03的这种独特组合将一个表/数据帧减少为一个,其中param04具有多个条目。
因此,您需要一个类似SQL SELECT param01,param02, param03 GROUP BY param04 HAVING COUNT(*) > 1
如果是这样:
import pandas as pd
html=r'<table><tbody><tr><th> </th><th><100>_poisson </th><th>avg wall time (s) </th><th>bulk_hill </th><th>c_{11} </th><th>c_{12} </th><th>c_{44} </th><th>homo_poisson </th><th>param01 </th><th>param02 </th><th>param03 </th><th>param04 </th><th>shear_hill </th><th>time_generated </th><th>young_hill</th></tr><tr><td>0 </td><td>0.264 </td><td>0 </td><td>91.6 </td><td>160.0 </td><td>57.4 </td><td>75.8 </td><td>0.214 </td><td>50.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.8 </td><td>2019-02-14 11:11:39.254305 </td><td>157.3</td></tr><tr><td>1 </td><td>0.268 </td><td>0 </td><td>89.5 </td><td>154.9 </td><td>56.8 </td><td>76.8 </td><td>0.211 </td><td>70.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.2 </td><td>2019-02-14 11:11:43.696335 </td><td>155.4</td></tr><tr><td>2 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.8 </td><td>0.210 </td><td>90.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.2 </td><td>2019-02-14 11:11:47.814102 </td><td>155.3</td></tr><tr><td>3 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:11:52.052636 </td><td>155.2</td></tr><tr><td>4 </td><td>0.268 </td><td>0 </td><td>89.5 </td><td>154.9 </td><td>56.8 </td><td>76.8 </td><td>0.211 </td><td>130.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:11:55.752065 </td><td>155.3</td></tr><tr><td>5 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>150.0 </td><td>50.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:11:59.631407 </td><td>155.2</td></tr><tr><td>6 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>30.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:12:03.275825 </td><td>155.2</td></tr><tr><td>7 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>40.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:12:07.057999 </td><td>155.2</td></tr><tr><td>8 </td><td>0.268 </td><td>0 </td><td>89.3 </td><td>154.7 </td><td>56.6 </td><td>76.7 </td><td>0.210 </td><td>110.0 </td><td>60.0 </td><td>11.0 </td><td>4.0 </td><td>64.1 </td><td>2019-02-14 11:12:11.655756 </td><td>155.2</td></tr></tbody></table>'
df = pd.read_html(html,header=0)[0]
df_params=df[['param01','param02', 'param03', 'param04']]
df_params.groupby('param04').filter(lambda x: len(x) > 1)
输出:
param01 param02 param03 param04
0 50.0 50.0 11.0 4.0
1 70.0 50.0 11.0 4.0
2 90.0 50.0 11.0 4.0
3 110.0 50.0 11.0 4.0
4 130.0 50.0 11.0 4.0
5 150.0 50.0 11.0 4.0
6 110.0 30.0 11.0 4.0
7 110.0 40.0 11.0 4.0
8 110.0 60.0 11.0 4.0
类似物:
SELECT * FROM
source_data T
JOIN (SELECT param01,param02, param03 GROUP BY param04 HAVING
COUNT(*) > 1) FLT
ON T.param01 = FLT.param01
AND T.param02=FLT.param02
AND T.param03=FLT.param03
是:
pd.merge(df, df_params.groupby('param04').filter(lambda x: len(x) > 1), on=['param01','param02','param03'])
尽管我认为必须写得更简洁,但它必须是正确的。