๋ฐ˜์‘ํ˜•

Summary of Instant Gratification

kaggle Top6% (95th of 1836) ๐Ÿฅ‰

์บก์ฒ˜

useful

BaseLine [cv scores = 0.537]

  • LGBoost ํ”ผ์ณ์ค‘์š”๋„ ๊ทธ๋ฆฐ ๊ฒฐ๊ณผ 'wheezy-copper-turtle-magic'๋ณ€์ˆ˜์˜ ์ค‘์š”๋„๊ฐ€ ๋งค์šฐ๋†’์•„ ํƒ์ƒ‰๊ฒฐ๊ณผ 'wheezy-copper-turtle-magic'๋ณ€์ˆ˜๋งŒ ์ •์ˆ˜๊ฐ’์„ ๊ฐ€์ง€๊ณ  ์žˆ์—ˆ์Œ. EDA๋ฅผ ํ†ตํ•ด 'wheezy-copper-turtle-magic' ๋ณ€์ˆ˜์™€ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํƒ์ƒ‰ํ•ด๋ด„

LogisticRegression [cv scores = 0.803]

  • eda ๋ฅผ ํ†ตํ•˜์—ฌ ๋ฐœ๊ฒฌํ•œ 'wheezy-copper-turtle-magic'๋ณ€์ˆ˜๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋“ค๊ณผ ์ƒํ˜ธ์ž‘์šฉ, ์Šค์ฝ”์–ด๊ฐ€ ํ–ฅ์ƒ.

Feature Selection [cv scores = 0.804]

  • ๋…๋ฆฝ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ฒฝ์šฐ ์•ฝ 500๊ฐœ์˜ ๋กœ์šฐ์™€ 255๊ฐœ์˜ ํ”ผ์ฒ˜๊ฐ€ ์žˆ์–ด ์ฐจ์›์˜์ €์ฃผ ์ฆ‰ ๊ณผ์ ํ•ฉ์— ๋น ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ ์€ ํ”ผ์ฒ˜๋กœ๋„ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๋ฐฉ๋ฒ•์„ ์ฐพ์•„์•ผ ํ–ˆ๊ณ  ๋ถ„์‚ฐ์ด 1.5 ์ด์ƒ์ธ ํ”ผ์ฒ˜๋“ค์ด ์˜ˆ์ธก๋ ฅ์ด ์žˆ์Œ์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.

Nonliear Model(NuSVC) [cv scores = 0.943]

  • ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์‹œ๋„(๋น„์„ ํ˜• ๋ชจ๋ธ์ด ๋†’์€์ ์ˆ˜๋ฅผ ์–ป์Œ)
  1. LR [cv scores = 0.804]
  2. KNN [cv scores = 0.907]
  3. SVC [cv scores = 0.919]
  4. NuSVC [cv scores = 0.943]
  5. MLP [cv scores = 0.910]

StandardScaler [cv scores = 0.953]

  • ๋น„์„ ํ˜• ๋ชจ๋ธNuSVC์— StandardScaler ์ ์šฉ

QDA [cv scores = 0.964]

  • ๋†’์€ ์Šค์ฝ”์–ด๋ฅผ ๊ธฐ๋กํ•˜๋Š” Quadratic Discriminant Analysis๋ชจ๋ธ์„ Vladislav๊ฐ€ ๊ณต๊ฐœ Chris Deotte๊ฐ€ QDA๋ฅผ ์„ค๋ช…

  • QDA๊ฐ€ ๋†’์€์Šค์ฝ”์–ด๋ฅผ ์–ป์€ ์ด์œ ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค๋ณ€๋Ÿ‰ ๊ฐ€์šฐ์Šค๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋ฉฐ ์ด๋Ÿฌํ•œ ๊ฐ€์ •์—์„œ ๋งค์šฐ ํšจ๊ณผ์ ์ธ ๋ชจ๋ธ์ด๊ธฐ ๋•Œ๋ฌธ

Ensemble Models_XGBoost [cv scores = 0.967]

  • ์•™์ƒ๋ธ” ๊ตฌ์„ฑ๋ชจ๋ธ
  1. KNN [cv scores = 0.902]
  2. SVC [cv scores = 0.950]
  3. NuSVC [cv scores = 0.960]
  4. MLP [cv scores = 0.908]
  5. QDA [cv scores = 0.965]

Pseudo Labeling [cv scores = 0.970]

  • Pseudo Labeling ์ด๋ž€ ํ™•์‹คํ•˜๊ฒŒ ์˜ˆ์ธก๋œ test๋ฐ์ดํ„ฐ๋ฅผ train๋ฐ์ดํ„ฐ์— ํฌํ•จ์‹œ์ผœ ํ›ˆ๋ จํ•˜๋Š”๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค.
  • Pseudo Labeling๊ฒฐ๊ณผ AUC 0.003์ƒ์Šน

Ensemble Models_XGBoost[cv scores = 0.9717][private = 0.972]

  • ์•™์ƒ๋ธ” ๊ตฌ์„ฑ๋ชจ๋ธ
  1. SVC [cv scores = 0.950]
  2. NuSVC [cv scores = 0.960]
  3. GMM [cv scores = 0.968]
  4. QDA + Pseudo [cv scores = 0.970]

1% solution[private = 0.9744]

  • ์ด๋ฒˆ comp์˜ ํ•ต์‹ฌ๋ชจ๋ธ์€ GMM์ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ make_classification์„ ํ†ตํ•ด ์ƒ์„ฑ๋˜์—ˆ๊ณ  n_clusters_per_class=3์ด๋ผ๋Š” ๊ฒƒ์„ ์•Œ์•„์•ผ๋งŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.

try

  • ๋ฐ˜์˜ฌ๋ฆผ
  • unique_value_count ๋ณ€์ˆ˜ ์ƒ์„ฑ
  • catergorial + NN, Lgboost, xgboost
  • VarianceThreshold
  • RobustScaler + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • StandardScaler + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • StandardScaler + PCA + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • RobustScaler + PCA + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • PolynomialFeatures + StandardScaler + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • PolynomialFeatures + RobustScaler+ VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • StandardScaler + PolynomialFeatures + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)
  • RobustScaler + PolynomialFeatures + VarianceThreshold + model(NuSVC, QDA, LR, MLP, KNN, SVC, LDA, GPC)

Learning

  • Data Structure : Chris Deotte discover Variables are not Gaussian

  • Adversarial Validation : test๋ฐ์ดํ„ฐ์™€ train๋ฐ์ดํ„ฐ๋Š” ๊ฐ™์€ ๋ถ„ํฌ์—์„œ ๋‚˜์˜จ๋ฐ์ดํ„ฐ ํ™•์ธ. ์ฐธ๊ณ 

  • make_classification : mhviraf๊ฐ€ make_classificationํ†ตํ•ด Synthetic data์ƒ์„ฑ real๋ฐ์ดํ„ฐ์™€ ๋น„์Šทํ•œ AUC๋ฅผ ์–ป์Œ

  • QDA

  • VarianceThreshold : ๋ถ„์‚ฐ์— ์˜ํ•ด ํŠน์„ฑ์„ ์„ ํƒํ•œ๋‹ค.

  • GMM : Gaussian Mixture Model์˜ ์•ฝ์ž๋กœ ์ด๋ฒˆ ์šฐ์Šน ์†”๋ฃจ์…˜์˜ ํ•ต์‹ฌ ๊ฐœ๋…์ด๋‹ค.

  • QLR(Quadratic logistic regression) : ์ด ๋ชจ๋ธ์€ QDA์™€ ๊ฐ™์€ quadratic boundary๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•˜๋Š” ๋ณ€์ˆ˜์™€ logistic regression์ด ํ•ฉ์ณ์ง„๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค ์˜ˆ) PolynomialFeatures + LR

top10 kernel

please upvote after click

๋ฐ˜์‘ํ˜•

'competition' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Compare optimizer of efficientNet  (2) 2019.11.06
Frequency Encoding์ด๋ž€?  (0) 2019.10.17
kaggle Top8% (681th of 8802) ๐Ÿฅ‰  (0) 2019.10.17
[kaggle] Adversarial validation part1  (0) 2019.06.11
make_classification(๋ฐ์ดํ„ฐ ๋งŒ๋“ค๊ธฐ)  (0) 2019.06.11

+ Recent posts