基于 Optuna 的模型超参数优化

不管现实多么惨不忍睹,都要持之以恒地相信,这只是黎明前短暂的黑暗而已。不要惶恐眼前的难关迈不过去,不要担心此刻的付出没有回报,别再花时间等待天降好运。真诚做人,努力做事!你想要的,岁月都会给你。基于 Optuna 的模型超参数优化,希望对大家有帮助,欢迎收藏,转发!站点地址:www.bmabk.com,来源:原文

1. Optuna简介

1.1 Optuna 介绍

Optuna官网: https://optuna.org/

Optuna是一个开源的超参数优化(HPO)框架,用于自动执行超参数的搜索空间。 为了找到最佳的超参数集,Optuna使用贝叶斯方法。 它支持下面列出的各种类型的采样器:

  • GridSampler (使用网格搜索)
  • RandomSampler (使用随机采样)
  • TPESampler (使用树结构的Parzen估计器算法)
  • CmaEsSampler (使用CMA-ES算法)

一个极简的 Optuna 的优化程序中只有三个最核心的概念,目标函数(objective),单次试验(trial),和研究(study)。其中

  • objective 负责定义待优化函数并指定参/超参数数范围
  • trial 对应着 objective 的单次执行
  • study 则负责管理优化,决定优化的方式,总试验的次数、试验结果的记录等功能。

1.2 Optuna 安装

可以选择基于pip的安装或者基于conda的安装方式。
Optuna is available at the Python Package Index and on Anaconda Cloud.

pip install optuna
conda install -c conda-forge optuna

1.3 Optuna 示例

下面举一个简单的例子,有助于大家的理解:

定义

x

,

y

(

10

,

10

)

x,y∈(−10,10)

x,y(10,10),求

f

(

x

)

=

(

x

+

y

)

2

f(x)=(x+y)^2

f(x)=(x+y)2 取得最大值时,

x

,

y

x,y

x,y 的取值?

import optuna
 
def objective(trial):
    x = trial.suggest_uniform('x', -10, 10)
    y = trial.suggest_uniform('y', -10, 10)
    return (x + y) ** 2
 
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
 
print(study.best_params)
print(study.best_value)

更多关于 Optuna 的基本介绍可以参考博客:

2. LGBM和XGBoost调参汇总

2.1 LGBM

2.1.1 定义Objective
from lightgbm import LGBMRegressor
import optuna
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, KFold
import optuna.integration.lightgbm as oplgb

def objective(trial):
    X_train, X_test, y_train, y_test=train_test_split(data, target, train_size=0.3)# 数据集划分
    param = {
        'metric': 'rmse', 
        'random_state': 48,
        'n_estimators': 20000,
        'reg_alpha': trial.suggest_loguniform('reg_alpha', 1e-3, 10.0),
        'reg_lambda': trial.suggest_loguniform('reg_lambda', 1e-3, 10.0),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.4,0.5,0.6,0.7,0.8,1.0]),
        'learning_rate': trial.suggest_categorical('learning_rate', [0.006,0.008,0.01,0.014,0.017,0.02]),
        'max_depth': trial.suggest_categorical('max_depth', [5, 7, 9, 11, 13, 15, 17, 20, 50]),
        'num_leaves' : trial.suggest_int('num_leaves', 1, 1000),
        'min_child_samples': trial.suggest_int('min_child_samples', 1, 300),
        'cat_smooth' : trial.suggest_int('cat_smooth', 1, 100)      
    }
    
    lgb=LGBMRegressor(**param)
    lgb.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=100, verbose=False)
    pred_lgb=lgb.predict(X_test)
    rmse = mean_squared_error(y_test, pred_lgb, squared=False)
    return rmse
2.1.2 调参try
study=optuna.create_study(direction='minimize')
n_trials=50 # try50次
study.optimize(objective, n_trials=n_trials)
2.1.3 绘图
optuna.visualization.plot_optimization_history(study)# 绘制
optuna.visualization.plot_parallel_coordinate(study)# 
optuna.visualization.plot_param_importances(study)# 
2.1.4 最佳参数
params=study.best_params
params['metric'] = 'rmse'

2.2 XGBOOST

2.2.1 定义Objectove
def objective(trial):
    data = train.iloc[:, :-1]
    target = train.target
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.3, random_state=42)
    param = {
        'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
        'alpha': trial.suggest_loguniform('alpha', 1e-3, 10.0),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.4, 0.5, 0.6, 0.7, 0.8, 1.0]),
        'learning_rate': trial.suggest_categorical('learning_rate',
                                                   [0.008, 0.009, 0.01, 0.012, 0.014, 0.016, 0.018, 0.02]),
        'n_estimators': 4000,
        'max_depth': trial.suggest_categorical('max_depth', [5, 7, 9, 11, 13, 15, 17, 20]),
        'random_state': trial.suggest_categorical('random_state', [24, 48, 2020]),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 300),
    }
    model = xgb.XGBRegressor(**param)
    model.fit(train_x, train_y, eval_set=[(test_x, test_y)], early_stopping_rounds=100, verbose=False)
    preds = model.predict(test_x)
    rmse = mean_squared_error(test_y, preds, squared=False)
    return rmse
2.2.2 调参try
study = optuna.create_study(direction='minimize')
n_trials=1
study.optimize(objective, n_trials=n_trials)
print('Number of finished trials:', len(study.trials))
print("------------------------------------------------")
print('Best trial:', study.best_trial.params)
print("------------------------------------------------")
print(study.trials_dataframe())
print("------------------------------------------------")
2.2.3 绘图
optuna.visualization.plot_optimization_history(study).show()
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study).show()
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study).show()
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample']).show()
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study).show()
#Visualize empirical distribution function
optuna.visualization.plot_edf(study).show()
2.2.4 最佳参数
params=study.best_params

[参考博客] :

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/162830.html

(0)
飞熊的头像飞熊bm

相关推荐

发表回复

登录后才能评论
极客之音——专业性很强的中文编程技术网站,欢迎收藏到浏览器,订阅我们!