Summary of Instant Gratification
kaggle Top6% (95th of 1836) ๐ฅ
useful
BaseLine [cv scores = 0.537]
- LGBoost ํผ์ณ์ค์๋ ๊ทธ๋ฆฐ ๊ฒฐ๊ณผ 'wheezy-copper-turtle-magic'๋ณ์์ ์ค์๋๊ฐ ๋งค์ฐ๋์ ํ์๊ฒฐ๊ณผ 'wheezy-copper-turtle-magic'๋ณ์๋ง ์ ์๊ฐ์ ๊ฐ์ง๊ณ ์์์. EDA๋ฅผ ํตํด 'wheezy-copper-turtle-magic' ๋ณ์์ ๋ค๋ฅธ ๋ณ์๊ฐ์ ์ํธ์์ฉ์ ํ์ํด๋ด
LogisticRegression [cv scores = 0.803]
- eda ๋ฅผ ํตํ์ฌ ๋ฐ๊ฒฌํ 'wheezy-copper-turtle-magic'๋ณ์๋ฅผ ๋ ๋ฆฝ์ ์ผ๋ก ๋ชจ๋ธ์ ๋ง๋ค์ด ๋ค๋ฅธ ๋ณ์๋ค๊ณผ ์ํธ์์ฉ, ์ค์ฝ์ด๊ฐ ํฅ์.
Feature Selection [cv scores = 0.804]
- ๋ ๋ฆฝ๋ชจ๋ธ์ ๋ง๋ค๊ฒฝ์ฐ ์ฝ 500๊ฐ์ ๋ก์ฐ์ 255๊ฐ์ ํผ์ฒ๊ฐ ์์ด ์ฐจ์์์ ์ฃผ ์ฆ ๊ณผ์ ํฉ์ ๋น ์ง ์ ์์ต๋๋ค. ์ ์ ํผ์ฒ๋ก๋ ๋น์ทํ ์ฑ๋ฅ์ ๋ด๋ ๋ฐฉ๋ฒ์ ์ฐพ์์ผ ํ๊ณ ๋ถ์ฐ์ด 1.5 ์ด์์ธ ํผ์ฒ๋ค์ด ์์ธก๋ ฅ์ด ์์์ ์ฐพ์์ต๋๋ค.
Nonliear Model(NuSVC) [cv scores = 0.943]
- ๋ค์ํ ๋ชจ๋ธ์๋(๋น์ ํ ๋ชจ๋ธ์ด ๋์์ ์๋ฅผ ์ป์)
- LR [cv scores = 0.804]
- KNN [cv scores = 0.907]
- SVC [cv scores = 0.919]
- NuSVC [cv scores = 0.943]
- MLP [cv scores = 0.910]
StandardScaler [cv scores = 0.953]
- ๋น์ ํ ๋ชจ๋ธNuSVC์ StandardScaler ์ ์ฉ
QDA [cv scores = 0.964]
-
๋์ ์ค์ฝ์ด๋ฅผ ๊ธฐ๋กํ๋ Quadratic Discriminant Analysis๋ชจ๋ธ์ Vladislav๊ฐ ๊ณต๊ฐ Chris Deotte๊ฐ QDA๋ฅผ ์ค๋ช
-
QDA๊ฐ ๋์์ค์ฝ์ด๋ฅผ ์ป์ ์ด์ ๋ ๋ฐ์ดํฐ๊ฐ ๋ค๋ณ๋ ๊ฐ์ฐ์ค๋ถํฌ๋ฅผ ๋ฐ๋ฅด๋ฉฐ ์ด๋ฌํ ๊ฐ์ ์์ ๋งค์ฐ ํจ๊ณผ์ ์ธ ๋ชจ๋ธ์ด๊ธฐ ๋๋ฌธ
Ensemble Models_XGBoost [cv scores = 0.967]
- ์์๋ธ ๊ตฌ์ฑ๋ชจ๋ธ
- KNN [cv scores = 0.902]
- SVC [cv scores = 0.950]
- NuSVC [cv scores = 0.960]
- MLP [cv scores = 0.908]
- QDA [cv scores = 0.965]
Pseudo Labeling [cv scores = 0.970]
- Pseudo Labeling ์ด๋ ํ์คํ๊ฒ ์์ธก๋ test๋ฐ์ดํฐ๋ฅผ train๋ฐ์ดํฐ์ ํฌํจ์์ผ ํ๋ จํ๋๊ฒ์ ๋งํฉ๋๋ค.
- Pseudo Labeling๊ฒฐ๊ณผ AUC 0.003์์น
Ensemble Models_XGBoost[cv scores = 0.9717][private = 0.972]
- ์์๋ธ ๊ตฌ์ฑ๋ชจ๋ธ
- SVC [cv scores = 0.950]
- NuSVC [cv scores = 0.960]
- GMM [cv scores = 0.968]
- QDA + Pseudo [cv scores = 0.970]
1% solution[private = 0.9744]
- ์ด๋ฒ comp์ ํต์ฌ๋ชจ๋ธ์ GMM์ด์์ต๋๋ค. ๋ฐ์ดํฐ๊ฐ make_classification์ ํตํด ์์ฑ๋์๊ณ n_clusters_per_class=3์ด๋ผ๋ ๊ฒ์ ์์์ผ๋ง ํ์ต๋๋ค.
try
- ๋ฐ์ฌ๋ฆผ
- unique_value_count ๋ณ์ ์์ฑ
- catergorial + NN, Lgboost, xgboost
- VarianceThreshold
- RobustScaler + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- StandardScaler + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- StandardScaler + PCA + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- RobustScaler + PCA + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- PolynomialFeatures + StandardScaler + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- PolynomialFeatures + RobustScaler+ VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- StandardScaler + PolynomialFeatures + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
- RobustScaler + PolynomialFeatures + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
Learning
-
Data Structure : Chris Deotte discover Variables are not Gaussian
-
Adversarial Validation : test๋ฐ์ดํฐ์ train๋ฐ์ดํฐ๋ ๊ฐ์ ๋ถํฌ์์ ๋์จ๋ฐ์ดํฐ ํ์ธ. ์ฐธ๊ณ
-
make_classification : mhviraf๊ฐ make_classificationํตํด Synthetic data์์ฑ real๋ฐ์ดํฐ์ ๋น์ทํ AUC๋ฅผ ์ป์
-
VarianceThreshold : ๋ถ์ฐ์ ์ํด ํน์ฑ์ ์ ํํ๋ค.
-
GMM : Gaussian Mixture Model์ ์ฝ์๋ก ์ด๋ฒ ์ฐ์น ์๋ฃจ์ ์ ํต์ฌ ๊ฐ๋ ์ด๋ค.
-
QLR(Quadratic logistic regression) : ์ด ๋ชจ๋ธ์ QDA์ ๊ฐ์ quadratic boundary๋ฅผ ๊ฐ์ง๊ฒ ํ๋ ๋ณ์์ logistic regression์ด ํฉ์ณ์ง๊ฒ์ ์๋ฏธํ๋ค ์) PolynomialFeatures + LR
top10 kernel
please upvote after click
'competition' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
Compare optimizer of efficientNet (2) | 2019.11.06 |
---|---|
Frequency Encoding์ด๋? (0) | 2019.10.17 |
kaggle Top8% (681th of 8802) ๐ฅ (0) | 2019.10.17 |
[kaggle] Adversarial validation part1 (0) | 2019.06.11 |
make_classification(๋ฐ์ดํฐ ๋ง๋ค๊ธฐ) (0) | 2019.06.11 |