Notice
Recent Posts
Recent Comments
Link
ยซ   2025/07   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
Archives
Today
Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

YJ_Scribbles

[ML] Light GBM ๊ฐœ๋… ๋ฐ ์˜ˆ์‹œ ์ฝ”๋“œ ๋ณธ๋ฌธ

Practice/Machine Learning

[ML] Light GBM ๊ฐœ๋… ๋ฐ ์˜ˆ์‹œ ์ฝ”๋“œ

์˜ค๋€จ๊ธฐ 2023. 5. 31. 10:30

๐Ÿ“Light GBM ๊ธฐ๋ณธ ๊ฐœ๋…

-> GradientBoosting์„ ๋ฐœ์ „์‹œํ‚จ ๊ฒƒ : XGBoost

-> XGBoost ์†๋„๋ฅผ ๋” ๋†’์ธ ๊ฒƒ : LightGBM

 

ใ…‡ ๊ธฐ์กด tree ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜ : Level wise(๊ท ํ˜• ํŠธ๋ฆฌ ๋ถ„ํ• ) ๋ฐฉ์‹ ์‚ฌ์šฉ

   -> ์ˆ˜ํ‰์  ํ™•์žฅ

   -> tree ๊นŠ์ด : ์ตœ์†Œํ™”

   -> ๊ท ํ˜•์„ ์œ„ํ•œ ์‹œ๊ฐ„ ํ•„์š”

 

 

 

ใ…‡ Light GBM ์•Œ๊ณ ๋ฆฌ์ฆ˜ : Leaf wise(๋ฆฌํ”„ ์ค‘์‹ฌ ํŠธ๋ฆฌ ๋ถ„ํ• ) ๋ฐฉ์‹ ์‚ฌ์šฉ

   -> ์ˆ˜์ง์  ํ™•์žฅ

   -> ์ตœ๋Œ€ ์†์‹ค ๊ฐ’(Max data loss)์„ ๊ฐ€์ง€๋Š” ๋ฆฌํ”„ ๋…ธํŠธ๋ฅผ ์ง€์†์ ์œผ๋กœ ๋ถ„ํ• 

   -> ํŠธ๋ฆฌ ๊ธฐ์ค€ ๋ถ„ํ•  ๋ฐฉ์‹์— ๋น„ํ•ด ์˜ˆ์ธก ์˜ค๋ฅ˜ ์†์‹ค์„ ์ตœ์†Œ

 

 

 

 

๐Ÿ“Light GBM ์žฅ์ 

 - ๊ฐ€๋ณ๊ณ  ์†๋„๊ฐ€ ๋น ๋ฆ„

 - ํฐ ์‚ฌ์ด์ฆˆ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‹คํ–‰์‹œํ‚ฌ ๋•Œ ์ ์€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ

 - categorical feature ๋“ค์˜ ์ž๋™ ๋ณ€ํ™˜๊ณผ ์ตœ์  ๋ถ„ํ• 

 - ๊ฒฐ๊ณผ์˜ ์ •ํ™•๋„์— ์ดˆ์ ์„ ๋งž์ถค

 - GPU ํ•™์Šต ์ง€์›

 

 

๐Ÿ“Light GBM ๋‹จ์ 

 - ์ ์€ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ์‹œ ๊ณผ์ ํ•ฉ ๊ฐ€๋Šฅ์„ฑ(์ ์€ ๋ฐ์ดํ„ฐ ๊ธฐ์ค€ : 10,000๊ฐœ)

 

 

 

 

๐Ÿ“Light GBM ํŒŒ๋ผ๋ฏธํ„ฐ

 -> Light GBM์€ ๊ตฌํ˜„์€ ์‰ฌ์šฐ๋‚˜, ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ 100๊ฐœ๊ฐ€ ๋„˜๋Š”๋‹ค : ์ค‘์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋งŒ ์•Œ๊ณ  ์žˆ์–ด๋„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์— ๋ฌด๋ฆฌ ์—†๋‹ค

 

 

 

๐Ÿ“Light GBM ์‹ค์Šต

-> ์•„๋‚˜์ฝ˜๋‹ค ์‚ฌ์šฉํ•˜๋Š” ํ™˜๊ฒฝ์ด๋ผ ๋จผ์ € ์„ค์น˜ ์ง„ํ–‰

 conda install -c conda-forge lightgbm
import lightgbm as lgb

# Preprocess your data
X = labeled_data.drop('label', axis=1)
y = labeled_data['label']

# Normalize the features
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

# Perform K-fold cross-validation
lgb_model = lgb.LGBMClassifier()
lgb_model.fit(X_train, y_train)

y_pred_prob = lgb_model.predict_proba(X_test)

# Perform K-fold cross-validation and calculate AUROC scores
y_pred_proba = cross_val_predict(lgb_model, X, y, cv=5, method='predict_proba')

roc_auc = roc_auc_score(y_test, y_pred_prob, multi_class='ovr', average='macro')

print("Macro-average AUROC score:", roc_auc)

plot_roc(y_test, y_pred_prob)

 

ใ…‡ feature importance

 

-> ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋‹ค๋ฅด๊ฒŒ ํ•˜๋ฉด ๋ชจ๋ธ ๋Œ๋ฆด๋•Œ๋งˆ๋‹ค feature importance๊ฐ€ ๋ณ€ํ•จ