반응형

오랜만에 dacon후기 남깁니다. 

 

한국에너지공단에서 요청한 건물들의 전력사용량을 예측하는 대회였습니다.

 

kaggle에서 처음으로 노트북 금메달을 획득한 https://www.kaggle.com/c/ashrae-energy-prediction 대회와 유사하여 도전하게 되었습니다.

 

아래의 아이디어가 점수 향상에 도움이되었습니다.

  • 각 빌딩 아이디 마다 모델을 만들어줬습니다.
  • 결측값을 보간하는 것이 아니라 test셋에 맞춰 결측값을 채워줬습니다.
  • 건물마다 휴일 패턴을 파악하여 휴일 변수를 만들어주었습니다.
  • 결측값을 제거해주었습니다.
  • day2 변수를 만들어 월 마지막주에 대한 예측력을 높였습니다.

아쉬운점

  • 특정 건물에 대해 과소예측을 하는 경향을 파악했지만, 해결하지 못했습니다.(상위권 팀 중 loss function을 통해 해결)
  • 정통 시계열 모델 및 딥러닝을 배우고자 참가하였지만, 결국 성능이 좋은 tree계열 모델만 시도했습니다.
  • 아직도 모델 정리가 부족해 같은 아이디어를 제출하여 제출횟수를 낭비했습니다.

 

코드 : https://dacon.io/competitions/official/235736/codeshare/2908?page=1&dtype=recent

반응형

'competition' 카테고리의 다른 글

[1st] Cassava Leaf Disease Classification -EDA  (0) 2021.01.10
DACON - 천체 유형 분류대회 후기(3th)  (0) 2020.05.02
14th solution - 9%  (0) 2019.11.07
Pseudo Labelling  (0) 2019.11.06
Compare optimizer of efficientNet  (2) 2019.11.06
반응형

캐글 노트북 필사 첫번째 대회는 Cassava Leaf Disease Classification입니다. 이 대회의 목적은 아프리카에서 두 번째로 큰 탄수화물 카사바에 발생하는 질병을 식별하여, 감염된 식물을 태워 확산을 방지하고, 식량 공급에 문제를 해결하기 위함입니다. Image competition이며, 이번 대회에서 가장 많은 vote를 받은 Notebook부터 필사 하도록 하겠습니다. 

 

필사커널링크 : www.kaggle.com/ihelon/cassava-leaf-disease-exploratory-data-analysis

 

Cassava Leaf Disease - Exploratory Data Analysis

Explore and run machine learning code with Kaggle Notebooks | Using data from Cassava Leaf Disease Classification

www.kaggle.com

 

4개의 질병과 1개의 정상 labels 총 5개의 labels가 존재합니다.

 

 

이미지의 갯수는 21,397, 픽셀 사이즈는 600 x 800

 

각 class의 비율을 그림을 통해 보여줍니다.

 

랜덤 샘플을 통한 시각화

 

클래스 별 시각화
albumentations image agumentation 1
albumentations image agumentation 2

 

augmentation 1,2 compose를 통해 동시적용

 

이번 대회의 평가지표는 AUC, 따라서 이러한 불균형 클래스를 가질경우 높은 빈도(3)로 예측하여도 약 60%의 AUC를 얻을 수 있습니다.

반응형

'competition' 카테고리의 다른 글

DACON - 전력사용량 예측 AI 경진대회(8th)  (0) 2021.07.10
DACON - 천체 유형 분류대회 후기(3th)  (0) 2020.05.02
14th solution - 9%  (0) 2019.11.07
Pseudo Labelling  (0) 2019.11.06
Compare optimizer of efficientNet  (2) 2019.11.06
반응형

 

국내 데이터분석 대회는 2017년 빅콘테스트 이후로 처음인 것 같습니다.

 

취업을 핑계로 미루다 이번에 DACON에서 진행한 천체 유형 분류대회에 참가했습니다.

 

역시 리더보드를 통해 순위를 올리는 쾌감은 롤 승급전 승리 만큼이나 짜릿한 것 같습니다. 

 

다시한번 기본기에 대해 공부가 필요하다는 것을 절실히 느꼇으며, 항상 대회를 혼자 진행하다 보니 아이디어의 한계를 느꼇으며 특히나 엄청 외롭습니다...

 

또한, 대회가 끝나갈 수록 정답제출 기회가 적어지면서 조금만 더 일찍 부지런히 시도해볼걸 하는 아쉬움이 남습니다.

 

조금 더 자세한 후기와 코드는 아래 남기며 다시한번 좋은 대회 열어주신 DACON에 감사드립니다. 

 

대회후기 : https://dacon.io/more/interview/87

코드 : https://dacon.io/competitions/official/235573/codeshare/694?page=1&dtype=recent

 

반응형

'competition' 카테고리의 다른 글

DACON - 전력사용량 예측 AI 경진대회(8th)  (0) 2021.07.10
[1st] Cassava Leaf Disease Classification -EDA  (0) 2021.01.10
14th solution - 9%  (0) 2019.11.07
Pseudo Labelling  (0) 2019.11.06
Compare optimizer of efficientNet  (2) 2019.11.06
반응형

 

 

 

 
3rd ML Month - 14th solution

image
 

14th solution - CNN-Stacking

Model composition Notebooks

EfficientNetB4[Public Score = 0.95600] 
 - Model training(v143, v144, v145, v146, v147)
Xception[Public Score = 0.94787]
 - Model training(v230, v231, v233, v234, v239)
Resnet50[Public Score = 0.92682]
 - Model training(v247, 249, v249)

CNN structure

Input(shape=(None, number of models, number of classes,1))    
Conv2D(filters=8,  kernelsize=(1, 1)) 
Conv2D(filters=16, kernelsize=(1, 1)) 
Dropout(0.5)
Conv2D(filters=32, kernelsize=(2, 1)) 
Dropout(0.5)
Conv2D(filters=64, kernelsize=(2, 1)) 
Dropout(0.5)
Flatten() 
Dense(1024) 
Dense(196, activation='softmax')

5-fold

  • validation set shape - (14594, 3, 196, 1)
  • Training this model with Kfold (k=5) on the validation set,
  • Optimizer - AdamAccumulate[Public Score = 0.95791]
  • Optimizer = adam [Public Score = 0.96174]
 

Notebooks update

Version1[Public Score = 0.811]

  1. EfficientNet_basic

Version3[Public Score = 0.828]

  1. Data Augmentation

Version8[Public Score = 0.924]

  1. Callbacklist(EarlyStopping, ReduceLROnPlateau, ModelCheckpoint)
  2. ayers.Dense(1024) -> layers.Dense(2048)

Version9[Public Score = 0.950] - Weighted average ensemble through cross validation

1. **fold1** : [Public 0.903]
2. **fold2** : [Public 0.923]
3. **fold3** : [Public 0.916]
4. **fold4** : [Public 0.926]
5. **fold5** : [Public 0.926]
- **Ensemble** [Public 0.950]

Version19[Public Score = 0.951]

  1. 5fold -> 6fold

Version25[Public Score = 0.954]

  1. Cutout Augmentation
  2. TTA

Version122[Public Score = 0.955]

  1. Model(EfficientNetB3 -> EfficientNetB4)
  2. imgsize(299 -> 380)
  3. Version119, Version120 Model training

Version135[Public Score = 0.956] - Semi-supervised Learning

  1. Pseudo Label dataset
  2. Version132, Version133 Model training
  3. Keras Semi-supervised Learning Notebooks

Version135[Public Score = 0.958]

  1. Transfer Learning,
  2. Optimizer( adam -> AdamAccumulate)

Version314[Public Score = 0.961] - CNN-stacking

 

Try

1. xgboost-stacking[Public Score = 0.94882]

EfficientNetB4[Public Score = 0.95600] 
Xception[Public Score = 0.94787]
Resnet50[Public Score = 0.92682]

  1. 위 세개 모델의 output을 feature사용 train_shape - (14594, 588)
  2. if class ==1, 1 else 0 이런식으로 총 196번 훈련
  3. 5-fold를 사용해 학습
  4. predict 결과를 argmax를 통해 class할당
  • class 196개를 한번에 돌리면 학습이 안됨
  • 196번 훈련 및 예측으로 인하여 class가 두,세번 예측되어 196을 넘어가는 경우가 많음(f1 스코어를 통해 최적화하는 방법을 시도하지 못했습니다.

2. Mixup augmentation

  • Mixup 시도하였으나 f1스코어가 떨어졌습니다. class별 훈련데이터가 적어서 그렇지않을까 생각합니다. image

3. Cutout augmentation

  • Cutout augmentation이란 사진의 일정부분을 제거하는 데이터 증각기법입니다. image

3. Ensemble

EfficientNetB4[Public Score = 0.95600] 
Xception[Public Score = 0.94787]
Resnet50[Public Score = 0.92682]

위의 세개의 모델로 가중평균 앙상블 을 시도했습니다. Public Score 는 0.95695로 낮게 나왔지만, 최종 Private Score는 0.95663로 최종점수보다 약2% 높은 점수를 얻었습니다. 앙상블의 힘이 대단하다는 것을 다시 한번 느낍니다

 

Result

  • 최종적으로 156명중 14등을 기록했습니다. 앙상블의 힘을 다시한번 느꼈으며, CNN Stacking을 늦게 시도하여 과적합을 잡지 못한것이 아쉬움이 남습니다. image
In [2]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
 
 

반응형

'competition' 카테고리의 다른 글

[1st] Cassava Leaf Disease Classification -EDA  (0) 2021.01.10
DACON - 천체 유형 분류대회 후기(3th)  (0) 2020.05.02
Pseudo Labelling  (0) 2019.11.06
Compare optimizer of efficientNet  (2) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
반응형
keras-semi-supervised-learning

3rd ML Month - Keras Semi-supervised Learning

배경

  • 이번 대회의 class는 196개로 매우 많습니다.
  • 훈련데이터셋을 class로 나누게 되면 데이터가 매우 적고 그러므로 더욱 더 많은 데이터가 있으면 잘 학습시킬 수 있지않을까? 생각했습니다.

Semi-supervised Learning

  • 그래서 Semi-supervised Learning 기법 중 Pseudo Labelling을 사용하려고 합니다.
  • Pseudo Labelling의 절차는 1) 훈련데이터를 통한 모델생성 2) 테스트데이터 예측 3) 훈련데이터에 확실하게 예측된 테스트데이터 추가 4) 결합데이터를 통한 모델생성 5) 테스트데이터 예측입니다.

reference

Package

In [1]:
import gc
import os
import warnings
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from tqdm import tqdm_notebook
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, GlobalAveragePooling2D
from keras import layers
from keras.optimizers import SGD, RMSprop


import os
print(os.listdir("../input"))
Using TensorFlow backend.
['car-crop2', 'car-crop', 'semi-detaset', '2019-3rd-ml-month-with-kakr']
In [2]:
#efficientnet download
!pip install -U efficientnet==0.0.4
from efficientnet import EfficientNetB3
Collecting efficientnet==0.0.4
  Downloading https://files.pythonhosted.org/packages/a6/80/f2c098284f7c07491e66af18d9a5fea595d4b507d10c0845275b8d47dc6f/efficientnet-0.0.4.tar.gz
Building wheels for collected packages: efficientnet
  Building wheel for efficientnet (setup.py) ... - done
  Created wheel for efficientnet: filename=efficientnet-0.0.4-cp36-none-any.whl size=14289 sha256=3fe6c6f90f05f8f8cd0e747991e54841d15ef0180ef6f2dfd72d9942c10c6d72
  Stored in directory: /tmp/.cache/pip/wheels/5c/34/68/a611a699a28239e964ccf144c0e767cdb5439fee82ec5de6e0
Successfully built efficientnet
Installing collected packages: efficientnet
Successfully installed efficientnet-0.0.4

File Directory Setting

In [3]:
#crop data directory
DATA_PATH = '../input/car-crop'
os.listdir(DATA_PATH)
Out[3]:
['train_crop', 'test_crop']
In [4]:
#original data directory
DATA_PATH2 = '../input/2019-3rd-ml-month-with-kakr'
os.listdir(DATA_PATH2)
Out[4]:
['test.csv',
 'test',
 'train',
 'train.csv',
 'class.csv',
 'sample_submission.csv']
In [5]:
#semi_data directory
DATA_PATH3 = '../input/semi-detaset'
os.listdir(DATA_PATH3)
Out[5]:
['Pseudo Labelsing.csv']
In [6]:
#crop merge directory
DATA_PATH4 = '../input/car-crop2'
os.listdir(DATA_PATH4)
Out[6]:
['train2_crop']
In [7]:
# 이미지 폴더 경로
TRAIN_IMG_PATH = os.path.join(DATA_PATH, 'train')
TEST_IMG_PATH = os.path.join(DATA_PATH, 'test')

# CSV 파일 경로
df_train = pd.read_csv(os.path.join(DATA_PATH2, 'train.csv'))
df_test = pd.read_csv(os.path.join(DATA_PATH2, 'test.csv'))
df_class = pd.read_csv(os.path.join(DATA_PATH2, 'class.csv'))

# 버전 1의 submission load
df_semi = pd.read_csv(os.path.join(DATA_PATH3, 'Pseudo Labelsing.csv'))

#버전 1에서 test를 tset로 저장하여 변경해줌 
name = list(map(lambda x:  x.replace("tset", "test"),df_semi['img_file']))
df_semi['img_file']=name
df_semi['img_file'] = df_semi['img_file']+'.jpg'
df_semi.head(5)
Out[7]:
img_file class
0 test_00001.jpg 124
1 test_00002.jpg 98
2 test_00003.jpg 157
3 test_00004.jpg 94
4 test_00005.jpg 18

train/test data Split

In [8]:
df_train["class"] = df_train["class"].astype('str')
df_semi["class"] = df_semi["class"].astype('str')
df_train = df_train[['img_file', 'class']]
df_test = df_test[['img_file']]

# train과 semi 데이터 병합
df_train2 = pd.concat([df_train, df_semi],axis=0)


its = np.arange(df_train2.shape[0])
train_idx, val_idx = train_test_split(its, train_size = 0.8, random_state=42)

X_train = df_train2.iloc[train_idx, :]
X_val = df_train2.iloc[val_idx, :]

print(X_train.shape)
print(X_val.shape)
print(df_test.shape)
df_train2.head(5)
(11985, 2)
(2997, 2)
(6150, 1)
Out[8]:
img_file class
0 train_00001.jpg 108
1 train_00002.jpg 71
2 train_00003.jpg 76
3 train_00004.jpg 188
4 train_00005.jpg 44

Parameter

In [9]:
#ref: https://github.com/yu4u/cutout-random-erasing/blob/master/cifar10_resnet.py
def get_random_eraser(p=0.5, s_l=0.02, s_h=0.4, r_1=0.3, r_2=1/0.3, v_l=0, v_h=255, pixel_level=False):
    def eraser(input_img):
        img_h, img_w, img_c = input_img.shape
        p_1 = np.random.rand()

        if p_1 > p:
            return input_img

        while True:
            s = np.random.uniform(s_l, s_h) * img_h * img_w
            r = np.random.uniform(r_1, r_2)
            w = int(np.sqrt(s / r))
            h = int(np.sqrt(s * r))
            left = np.random.randint(0, img_w)
            top = np.random.randint(0, img_h)

            if left + w <= img_w and top + h <= img_h:
                break

        if pixel_level:
            c = np.random.uniform(v_l, v_h, (h, w, img_c))
        else:
            c = np.random.uniform(v_l, v_h)

        input_img[top:top + h, left:left + w, :] = c

        return input_img

    return eraser
In [10]:
import keras.backend as K
from keras.legacy import interfaces
from keras.optimizers import Optimizer


class AdamAccumulate(Optimizer):

    def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
                 epsilon=None, decay=0., amsgrad=False, accum_iters=1, **kwargs):
        if accum_iters < 1:
            raise ValueError('accum_iters must be >= 1')
        super(AdamAccumulate, self).__init__(**kwargs)
        with K.name_scope(self.__class__.__name__):
            self.iterations = K.variable(0, dtype='int64', name='iterations')
            self.lr = K.variable(lr, name='lr')
            self.beta_1 = K.variable(beta_1, name='beta_1')
            self.beta_2 = K.variable(beta_2, name='beta_2')
            self.decay = K.variable(decay, name='decay')
        if epsilon is None:
            epsilon = K.epsilon()
        self.epsilon = epsilon
        self.initial_decay = decay
        self.amsgrad = amsgrad
        self.accum_iters = K.variable(accum_iters, K.dtype(self.iterations))
        self.accum_iters_float = K.cast(self.accum_iters, K.floatx())

    @interfaces.legacy_get_updates_support
    def get_updates(self, loss, params):
        grads = self.get_gradients(loss, params)
        self.updates = [K.update_add(self.iterations, 1)]

        lr = self.lr

        completed_updates = K.cast(K.tf.floordiv(self.iterations, self.accum_iters), K.floatx())

        if self.initial_decay > 0:
            lr = lr * (1. / (1. + self.decay * completed_updates))

        t = completed_updates + 1

        lr_t = lr * (K.sqrt(1. - K.pow(self.beta_2, t)) / (1. - K.pow(self.beta_1, t)))

        # self.iterations incremented after processing a batch
        # batch:              1 2 3 4 5 6 7 8 9
        # self.iterations:    0 1 2 3 4 5 6 7 8
        # update_switch = 1:        x       x    (if accum_iters=4)  
        update_switch = K.equal((self.iterations + 1) % self.accum_iters, 0)
        update_switch = K.cast(update_switch, K.floatx())

        ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        gs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]

        if self.amsgrad:
            vhats = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        else:
            vhats = [K.zeros(1) for _ in params]

        self.weights = [self.iterations] + ms + vs + vhats

        for p, g, m, v, vhat, tg in zip(params, grads, ms, vs, vhats, gs):

            sum_grad = tg + g
            avg_grad = sum_grad / self.accum_iters_float

            m_t = (self.beta_1 * m) + (1. - self.beta_1) * avg_grad
            v_t = (self.beta_2 * v) + (1. - self.beta_2) * K.square(avg_grad)

            if self.amsgrad:
                vhat_t = K.maximum(vhat, v_t)
                p_t = p - lr_t * m_t / (K.sqrt(vhat_t) + self.epsilon)
                self.updates.append(K.update(vhat, (1 - update_switch) * vhat + update_switch * vhat_t))
            else:
                p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon)

            self.updates.append(K.update(m, (1 - update_switch) * m + update_switch * m_t))
            self.updates.append(K.update(v, (1 - update_switch) * v + update_switch * v_t))
            self.updates.append(K.update(tg, (1 - update_switch) * sum_grad))
            new_p = p_t

            # Apply constraints.
            if getattr(p, 'constraint', None) is not None:
                new_p = p.constraint(new_p)

            self.updates.append(K.update(p, (1 - update_switch) * p + update_switch * new_p))
        return self.updates

    def get_config(self):
        config = {'lr': float(K.get_value(self.lr)),
                  'beta_1': float(K.get_value(self.beta_1)),
                  'beta_2': float(K.get_value(self.beta_2)),
                  'decay': float(K.get_value(self.decay)),
                  'epsilon': self.epsilon,
                  'amsgrad': self.amsgrad}
        base_config = super(AdamAccumulate, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))
In [11]:
# Parameter
img_size = (300, 300)
image_size = 300
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
nb_test_samples = len(df_test)
epochs = 30
batch_size = 32

# Define Generator config
train_datagen =ImageDataGenerator(
    rescale=1./255,
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=False,
    fill_mode='nearest',
    preprocessing_function = get_random_eraser(v_l=0, v_h=1),
    )

val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
In [12]:
#generator
train_generator = train_datagen.flow_from_dataframe(
    dataframe=X_train, 
    directory='../input/car-crop2/train2_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size,
    seed=42
)

validation_generator = val_datagen.flow_from_dataframe(
    dataframe=X_val, 
    directory='../input/car-crop2/train2_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size
)

test_generator = test_datagen.flow_from_dataframe(
    dataframe=df_test,
    directory='../input/car-crop/test_crop',
    x_col='img_file',
    y_col=None,
    target_size= img_size,
    color_mode='rgb',
    class_mode=None,
    batch_size=batch_size,
    shuffle=False
)
Found 11985 validated image filenames belonging to 196 classes.
Found 2997 validated image filenames belonging to 196 classes.
Found 6150 validated image filenames.

Model

In [13]:
#model
opt = AdamAccumulate(lr=0.001, decay=1e-5, accum_iters=5)
EfficientNet_model = EfficientNetB3(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))


model = Sequential()
model.add(EfficientNet_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(2048, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(196, activation='softmax'))
model.summary()

#compile
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['acc'])
Downloading data from https://github.com/qubvel/efficientnet/releases/download/v0.0.1/efficientnet-b3_imagenet_1000_notop.h5
43974656/43966704 [==============================] - 1s 0us/step
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
efficientnet-b3 (Model)      (None, 10, 10, 1536)      10783528  
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1536)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2048)              3147776   
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 196)               401604    
=================================================================
Total params: 14,332,908
Trainable params: 14,245,612
Non-trainable params: 87,296
_________________________________________________________________
In [14]:
def get_steps(num_samples, batch_size):
    if (num_samples % batch_size) > 0 :
        return (num_samples // batch_size) + 1
    else :
        return num_samples // batch_size
In [15]:
%%time
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

#model path
MODEL_SAVE_FOLDER_PATH = './model/'
if not os.path.exists(MODEL_SAVE_FOLDER_PATH):
    os.mkdir(MODEL_SAVE_FOLDER_PATH)

model_path = MODEL_SAVE_FOLDER_PATH + '{epoch:02d}-{val_loss:.4f}.hdf5'

patient = 3
callbacks_list = [
     EarlyStopping(
        # 모델의 검증 정확도 모니터링
        monitor='val_loss',
        # patient(정수)보다 정확도가 향상되지 않으면 훈련 종료
        patience=patient, 
        # 검증에 대해 판단하기 위한 기준, val_loss경우 감소되는 것이므로 min
        mode='min', 
        #얼마나 자세하게 정보를 나타낼것인가.
        verbose=1
                          
    ),
    ReduceLROnPlateau(
        monitor = 'val_loss', 
        #콜백 호출시 학습률(lr)을 절반으로 줄임
        factor = 0.5, 
        #위와 동일
        patience = patient / 2, 
        #최소학습률
        min_lr=0.00001,
        verbose=1,
        mode='min'
    ),
    ModelCheckpoint(
        filepath=model_path,
        monitor ='val_loss',
        # val_loss가 좋지 않으면 모델파일을 덮어쓰지 않는다
        save_best_only = True,
        verbose=1,
        mode='min') ]

    

history = model.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
gc.collect()
Epoch 1/30
375/375 [==============================] - 457s 1s/step - loss: 3.7669 - acc: 0.1977 - val_loss: 1.3730 - val_acc: 0.6123

Epoch 00001: val_loss improved from inf to 1.37298, saving model to ./model/01-1.3730.hdf5
Epoch 2/30
375/375 [==============================] - 383s 1s/step - loss: 1.1834 - acc: 0.6701 - val_loss: 0.5710 - val_acc: 0.8282

Epoch 00002: val_loss improved from 1.37298 to 0.57099, saving model to ./model/02-0.5710.hdf5
Epoch 3/30
375/375 [==============================] - 384s 1s/step - loss: 0.6122 - acc: 0.8226 - val_loss: 0.3832 - val_acc: 0.8896

Epoch 00003: val_loss improved from 0.57099 to 0.38318, saving model to ./model/03-0.3832.hdf5
Epoch 4/30
375/375 [==============================] - 383s 1s/step - loss: 0.4298 - acc: 0.8751 - val_loss: 0.3178 - val_acc: 0.9072

Epoch 00004: val_loss improved from 0.38318 to 0.31778, saving model to ./model/04-0.3178.hdf5
Epoch 5/30
375/375 [==============================] - 384s 1s/step - loss: 0.3330 - acc: 0.8999 - val_loss: 0.3468 - val_acc: 0.9029

Epoch 00005: val_loss did not improve from 0.31778
Epoch 6/30
375/375 [==============================] - 382s 1s/step - loss: 0.2818 - acc: 0.9150 - val_loss: 0.3284 - val_acc: 0.9156

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.

Epoch 00006: val_loss did not improve from 0.31778
Epoch 7/30
375/375 [==============================] - 386s 1s/step - loss: 0.1712 - acc: 0.9480 - val_loss: 0.2203 - val_acc: 0.9466

Epoch 00007: val_loss improved from 0.31778 to 0.22033, saving model to ./model/07-0.2203.hdf5
Epoch 8/30
375/375 [==============================] - 386s 1s/step - loss: 0.1270 - acc: 0.9621 - val_loss: 0.2148 - val_acc: 0.9499

Epoch 00008: val_loss improved from 0.22033 to 0.21477, saving model to ./model/08-0.2148.hdf5
Epoch 9/30
375/375 [==============================] - 387s 1s/step - loss: 0.1175 - acc: 0.9638 - val_loss: 0.2108 - val_acc: 0.9533

Epoch 00009: val_loss improved from 0.21477 to 0.21076, saving model to ./model/09-0.2108.hdf5
Epoch 10/30
375/375 [==============================] - 386s 1s/step - loss: 0.0958 - acc: 0.9706 - val_loss: 0.2121 - val_acc: 0.9503

Epoch 00010: val_loss did not improve from 0.21076
Epoch 11/30
375/375 [==============================] - 387s 1s/step - loss: 0.0943 - acc: 0.9718 - val_loss: 0.2321 - val_acc: 0.9469

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.

Epoch 00011: val_loss did not improve from 0.21076
Epoch 12/30
375/375 [==============================] - 386s 1s/step - loss: 0.0784 - acc: 0.9777 - val_loss: 0.2116 - val_acc: 0.9536

Epoch 00012: val_loss did not improve from 0.21076
Epoch 00012: early stopping
CPU times: user 1h 41min 55s, sys: 38min 26s, total: 2h 20min 22s
Wall time: 1h 18min 47s
Out[15]:
60

acc / loss Plot

In [16]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, label='Training acc')
plt.plot(epochs, val_acc, label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.ylim(0.9,1)
plt.show()
In [17]:
plt.plot(epochs, loss, label='Training loss')
plt.plot(epochs, val_loss, label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.ylim(0,0.5)
plt.show()

Predict

In [18]:
%%time
test_generator.reset()
prediction = model.predict_generator(
    generator = test_generator,
    steps = get_steps(nb_test_samples, batch_size),
    verbose=1
)
193/193 [==============================] - 64s 329ms/step
CPU times: user 50.7 s, sys: 31.1 s, total: 1min 21s
Wall time: 1min 3s

Submission

In [19]:
submission = pd.read_csv(os.path.join(DATA_PATH2, 'sample_submission.csv'))
predicted_class_indices=np.argmax(prediction, axis=1)

# Generator class dictionary mapping
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]

submission["class"] = predictions
submission.to_csv("submission_all.csv", index=False)
submission.head()
Out[19]:
img_file class
0 test_00001.jpg 124
1 test_00002.jpg 98
2 test_00003.jpg 157
3 test_00004.jpg 94
4 test_00005.jpg 18

Result

  • Public 0.930 에서 Public 0.941로 0.011정도 올랐습니다!!.

image.png

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형

'competition' 카테고리의 다른 글

DACON - 천체 유형 분류대회 후기(3th)  (0) 2020.05.02
14th solution - 9%  (0) 2019.11.07
Compare optimizer of efficientNet  (2) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
kaggle Top8% (681th of 8802) 🥉  (0) 2019.10.17
반응형
compare-optimizer-of-efficientnet

3rd ML Month - Compare optimizer of efficientNet

Reference

Package

In [1]:
import gc
import os
import warnings
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import f1_score
from keras import backend as K
# for문 시간계산 lib
from tqdm import tqdm_notebook
# 교차검증 lib
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
#모델 lib
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, Model
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, GlobalAveragePooling2D
from keras import layers
from keras.optimizers import Adam,RMSprop,SGD,Nadam
#경고메세지 무시
import warnings
warnings.filterwarnings(action='ignore')
#input 하위 디렉토리 폴터
import os
print(os.listdir("../input"))
Using TensorFlow backend.
['car-crop', '2019-3rd-ml-month-with-kakr']
In [2]:
#efficientnet download
!pip install git+https://github.com/qubvel/efficientnet
from efficientnet import EfficientNetB3
Collecting git+https://github.com/qubvel/efficientnet
  Cloning https://github.com/qubvel/efficientnet to /tmp/pip-req-build-4pdycl7v
  Running command git clone -q https://github.com/qubvel/efficientnet /tmp/pip-req-build-4pdycl7v
Building wheels for collected packages: efficientnet
  Building wheel for efficientnet (setup.py) ... - \ done
  Stored in directory: /tmp/pip-ephem-wheel-cache-npug02wz/wheels/64/60/2e/30ebaa76ed1626e86bfb0cc0579b737fdb7d9ff8cb9522663a
Successfully built efficientnet
Installing collected packages: efficientnet
Successfully installed efficientnet-0.0.4

File Directory Setting

In [3]:
#crop data directory
DATA_PATH = '../input/car-crop'
os.listdir(DATA_PATH)
Out[3]:
['train_crop', 'test_crop']
In [4]:
#original data directory
DATA_PATH2 = '../input/2019-3rd-ml-month-with-kakr'
os.listdir(DATA_PATH2)
Out[4]:
['test.csv',
 'test',
 'train',
 'train.csv',
 'class.csv',
 'sample_submission.csv']
In [5]:
# 이미지 폴더 경로
TRAIN_IMG_PATH = os.path.join(DATA_PATH, 'train')
TEST_IMG_PATH = os.path.join(DATA_PATH, 'test')

# CSV 파일 경로
df_train = pd.read_csv(os.path.join(DATA_PATH2, 'train.csv'))
df_test = pd.read_csv(os.path.join(DATA_PATH2, 'test.csv'))
df_class = pd.read_csv(os.path.join(DATA_PATH2, 'class.csv'))

train/test data Split

In [6]:
df_train["class"] = df_train["class"].astype('str')

df_train = df_train[['img_file', 'class']]
df_test = df_test[['img_file']]

its = np.arange(df_train.shape[0])
train_idx, val_idx = train_test_split(its, train_size = 0.8, random_state=42)

X_train = df_train.iloc[train_idx, :]
X_val = df_train.iloc[val_idx, :]

print(X_train.shape)
print(X_val.shape)
print(df_test.shape)
(7992, 2)
(1998, 2)
(6150, 1)

Parameter

In [7]:
def recall_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

def precision_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))
In [8]:
# Parameter
img_size = (299, 299)
image_size = 299
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
nb_test_samples = len(df_test)
epochs = 20
batch_size = 32

# Define Generator config
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    vertical_flip=False,
    zoom_range=0.2,
    fill_mode='nearest')
val_datagen = ImageDataGenerator(rescale=1./255)
In [9]:
#generator
train_generator = train_datagen.flow_from_dataframe(
    dataframe=X_train, 
    directory='../input/car-crop/train_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size,
    seed=42
)

validation_generator = val_datagen.flow_from_dataframe(
    dataframe=X_val, 
    directory='../input/car-crop/train_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size,
    shuffle=False,
    seed=42
)
Found 7992 validated image filenames belonging to 196 classes.
Found 1998 validated image filenames belonging to 196 classes.

Model

In [10]:
def get_steps(num_samples, batch_size):
    if (num_samples % batch_size) > 0 :
        return (num_samples // batch_size) + 1
    else :
        return num_samples // batch_size
In [11]:
%%time
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

#model path
MODEL_SAVE_FOLDER_PATH = './model/'
if not os.path.exists(MODEL_SAVE_FOLDER_PATH):
    os.mkdir(MODEL_SAVE_FOLDER_PATH)

model_path = MODEL_SAVE_FOLDER_PATH + '{epoch:02d}-{val_loss:.4f}.hdf5'

patient = 2
callbacks_list = [
    EarlyStopping(
        # 모델의 검증 정확도 모니터링
        monitor='val_loss', 
        # patient(정수)보다 정확도가 향상되지 않으면 훈련 종료
        patience=patient, 
        # 검증에 대해 판단하기 위한 기준, val_loss경우 감소되는 것이므로 min
        mode='min', 
        #얼마나 자세하게 정보를 나타낼것인가.
        verbose=1
                          
    ),
    ReduceLROnPlateau(
        monitor = 'val_loss', 
        #콜백 호출시 학습률(lr)을 절반으로 줄임
        factor = 0.5, 
        #위와 동일
        patience = patient / 2, 
        #최소학습률
        min_lr=0.00001,
        verbose=1,
        mode='min'
    ) ]
gc.collect()
CPU times: user 116 ms, sys: 4 ms, total: 120 ms
Wall time: 116 ms
Out[11]:
381
In [12]:
#model
def get_model():
    EfficientNet_model = base_model = EfficientNetB3(weights='imagenet', include_top=False, 
                                                     input_shape=(299, 299, 3))


    model = Sequential()
    model.add(EfficientNet_model)
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dense(2048, activation='relu'))
    model.add(layers.Dropout(0.25))
    model.add(layers.Dense(196, activation='softmax'))
    #model.summary()
    
    return model

Optimizer 1: RMSprop

In [13]:
#compile
model_rmsprop = get_model()
model_rmsprop.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc',f1_m])
hist_rmsprop = model_rmsprop.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Downloading data from https://github.com/qubvel/efficientnet/releases/download/v0.0.1/efficientnet-b3_imagenet_1000_notop.h5
43974656/43966704 [==============================] - 1s 0us/step
Epoch 1/20
250/250 [==============================] - 290s 1s/step - loss: 3.5051 - acc: 0.2196 - f1_m: 0.1509 - val_loss: 1.9829 - val_acc: 0.4630 - val_f1_m: 0.4226
Epoch 2/20
250/250 [==============================] - 248s 993ms/step - loss: 1.3776 - acc: 0.6072 - f1_m: 0.5945 - val_loss: 1.4108 - val_acc: 0.6371 - val_f1_m: 0.6503
Epoch 3/20
250/250 [==============================] - 249s 995ms/step - loss: 0.9035 - acc: 0.7302 - f1_m: 0.7368 - val_loss: 0.9871 - val_acc: 0.7472 - val_f1_m: 0.7539
Epoch 4/20
250/250 [==============================] - 248s 993ms/step - loss: 0.6861 - acc: 0.7971 - f1_m: 0.8006 - val_loss: 0.9284 - val_acc: 0.7658 - val_f1_m: 0.7778
Epoch 5/20
250/250 [==============================] - 250s 1s/step - loss: 0.5730 - acc: 0.8353 - f1_m: 0.8366 - val_loss: 0.7969 - val_acc: 0.8163 - val_f1_m: 0.8232
Epoch 6/20
250/250 [==============================] - 253s 1s/step - loss: 0.4692 - acc: 0.8576 - f1_m: 0.8580 - val_loss: 0.8536 - val_acc: 0.8198 - val_f1_m: 0.8241

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/20
250/250 [==============================] - 248s 991ms/step - loss: 0.2281 - acc: 0.9255 - f1_m: 0.9264 - val_loss: 0.5314 - val_acc: 0.8789 - val_f1_m: 0.8827
Epoch 8/20
250/250 [==============================] - 246s 985ms/step - loss: 0.1798 - acc: 0.9421 - f1_m: 0.9430 - val_loss: 0.5439 - val_acc: 0.8839 - val_f1_m: 0.8874

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 9/20
250/250 [==============================] - 246s 982ms/step - loss: 0.1038 - acc: 0.9646 - f1_m: 0.9656 - val_loss: 0.4406 - val_acc: 0.9079 - val_f1_m: 0.9102
Epoch 10/20
250/250 [==============================] - 245s 978ms/step - loss: 0.0694 - acc: 0.9764 - f1_m: 0.9770 - val_loss: 0.4560 - val_acc: 0.9049 - val_f1_m: 0.9065

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 11/20
250/250 [==============================] - 247s 990ms/step - loss: 0.0472 - acc: 0.9844 - f1_m: 0.9846 - val_loss: 0.4593 - val_acc: 0.9079 - val_f1_m: 0.9106

Epoch 00011: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 00011: early stopping

Optimizer 2: Adam

In [14]:
#compile
model_adam = get_model()
model_adam.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['acc',f1_m])
hist_adam = model_adam.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 281s 1s/step - loss: 3.7481 - acc: 0.1823 - f1_m: 0.1136 - val_loss: 2.7741 - val_acc: 0.3639 - val_f1_m: 0.3564
Epoch 2/20
250/250 [==============================] - 245s 981ms/step - loss: 1.6301 - acc: 0.5540 - f1_m: 0.5289 - val_loss: 1.2303 - val_acc: 0.6612 - val_f1_m: 0.6570
Epoch 3/20
250/250 [==============================] - 245s 980ms/step - loss: 0.9075 - acc: 0.7375 - f1_m: 0.7402 - val_loss: 0.8972 - val_acc: 0.7487 - val_f1_m: 0.7541
Epoch 4/20
250/250 [==============================] - 244s 977ms/step - loss: 0.6960 - acc: 0.7932 - f1_m: 0.7889 - val_loss: 0.8714 - val_acc: 0.7733 - val_f1_m: 0.7783
Epoch 5/20
250/250 [==============================] - 245s 980ms/step - loss: 0.5387 - acc: 0.8333 - f1_m: 0.8360 - val_loss: 0.7838 - val_acc: 0.7943 - val_f1_m: 0.7996
Epoch 6/20
250/250 [==============================] - 244s 978ms/step - loss: 0.4708 - acc: 0.8572 - f1_m: 0.8600 - val_loss: 0.7524 - val_acc: 0.8248 - val_f1_m: 0.8348
Epoch 7/20
250/250 [==============================] - 245s 979ms/step - loss: 0.3988 - acc: 0.8737 - f1_m: 0.8768 - val_loss: 0.7650 - val_acc: 0.8058 - val_f1_m: 0.8189

Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 8/20
250/250 [==============================] - 244s 975ms/step - loss: 0.1856 - acc: 0.9407 - f1_m: 0.9418 - val_loss: 0.4081 - val_acc: 0.8959 - val_f1_m: 0.8962
Epoch 9/20
250/250 [==============================] - 244s 976ms/step - loss: 0.1304 - acc: 0.9559 - f1_m: 0.9562 - val_loss: 0.4419 - val_acc: 0.8944 - val_f1_m: 0.8960

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 10/20
250/250 [==============================] - 246s 982ms/step - loss: 0.0780 - acc: 0.9755 - f1_m: 0.9751 - val_loss: 0.4020 - val_acc: 0.9104 - val_f1_m: 0.9131
Epoch 11/20
250/250 [==============================] - 245s 980ms/step - loss: 0.0489 - acc: 0.9842 - f1_m: 0.9840 - val_loss: 0.3835 - val_acc: 0.9134 - val_f1_m: 0.9190
Epoch 12/20
250/250 [==============================] - 244s 978ms/step - loss: 0.0464 - acc: 0.9843 - f1_m: 0.9842 - val_loss: 0.3998 - val_acc: 0.9129 - val_f1_m: 0.9157

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 13/20
250/250 [==============================] - 246s 986ms/step - loss: 0.0377 - acc: 0.9872 - f1_m: 0.9872 - val_loss: 0.3781 - val_acc: 0.9199 - val_f1_m: 0.9232
Epoch 14/20
250/250 [==============================] - 251s 1s/step - loss: 0.0297 - acc: 0.9906 - f1_m: 0.9910 - val_loss: 0.3760 - val_acc: 0.9219 - val_f1_m: 0.9257
Epoch 15/20
250/250 [==============================] - 251s 1s/step - loss: 0.0297 - acc: 0.9903 - f1_m: 0.9904 - val_loss: 0.3899 - val_acc: 0.9219 - val_f1_m: 0.9240

Epoch 00015: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 16/20
250/250 [==============================] - 252s 1s/step - loss: 0.0249 - acc: 0.9915 - f1_m: 0.9917 - val_loss: 0.3785 - val_acc: 0.9259 - val_f1_m: 0.9273

Epoch 00016: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 00016: early stopping

Optimizer 3: Nadam

In [15]:
#compile
model_nadam = get_model()
model_nadam.compile(loss='categorical_crossentropy', optimizer=Nadam(), metrics=['acc',f1_m])
hist_nadam = model_nadam.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 288s 1s/step - loss: 4.1294 - acc: 0.1234 - f1_m: 0.0572 - val_loss: 4.2228 - val_acc: 0.1732 - val_f1_m: 0.1424
Epoch 2/20
250/250 [==============================] - 242s 967ms/step - loss: 2.2012 - acc: 0.4195 - f1_m: 0.3679 - val_loss: 2.6808 - val_acc: 0.4960 - val_f1_m: 0.4767
Epoch 3/20
250/250 [==============================] - 242s 969ms/step - loss: 1.4434 - acc: 0.5975 - f1_m: 0.5878 - val_loss: 1.2144 - val_acc: 0.6787 - val_f1_m: 0.6876
Epoch 4/20
250/250 [==============================] - 242s 970ms/step - loss: 1.0976 - acc: 0.6841 - f1_m: 0.6825 - val_loss: 1.1604 - val_acc: 0.6827 - val_f1_m: 0.6945
Epoch 5/20
250/250 [==============================] - 241s 963ms/step - loss: 0.8958 - acc: 0.7365 - f1_m: 0.7412 - val_loss: 0.9532 - val_acc: 0.7412 - val_f1_m: 0.7506
Epoch 6/20
250/250 [==============================] - 241s 963ms/step - loss: 0.7421 - acc: 0.7752 - f1_m: 0.7774 - val_loss: 0.8828 - val_acc: 0.7548 - val_f1_m: 0.7629
Epoch 7/20
250/250 [==============================] - 242s 966ms/step - loss: 0.6870 - acc: 0.7912 - f1_m: 0.7934 - val_loss: 0.8472 - val_acc: 0.7723 - val_f1_m: 0.7799
Epoch 8/20
250/250 [==============================] - 241s 964ms/step - loss: 0.6115 - acc: 0.8132 - f1_m: 0.8144 - val_loss: 0.8297 - val_acc: 0.7783 - val_f1_m: 0.7875
Epoch 9/20
250/250 [==============================] - 240s 961ms/step - loss: 0.5497 - acc: 0.8315 - f1_m: 0.8336 - val_loss: 0.7878 - val_acc: 0.8023 - val_f1_m: 0.8098
Epoch 10/20
250/250 [==============================] - 241s 962ms/step - loss: 0.5024 - acc: 0.8471 - f1_m: 0.8473 - val_loss: 0.8300 - val_acc: 0.7803 - val_f1_m: 0.7876

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0010000000474974513.
Epoch 11/20
250/250 [==============================] - 240s 960ms/step - loss: 0.2669 - acc: 0.9133 - f1_m: 0.9142 - val_loss: 0.4663 - val_acc: 0.8869 - val_f1_m: 0.8897
Epoch 12/20
250/250 [==============================] - 241s 964ms/step - loss: 0.1816 - acc: 0.9398 - f1_m: 0.9399 - val_loss: 0.4990 - val_acc: 0.8709 - val_f1_m: 0.8717

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 13/20
250/250 [==============================] - 241s 965ms/step - loss: 0.1196 - acc: 0.9607 - f1_m: 0.9615 - val_loss: 0.4290 - val_acc: 0.8919 - val_f1_m: 0.8933
Epoch 14/20
250/250 [==============================] - 242s 967ms/step - loss: 0.0917 - acc: 0.9695 - f1_m: 0.9706 - val_loss: 0.4254 - val_acc: 0.8999 - val_f1_m: 0.9043
Epoch 15/20
250/250 [==============================] - 240s 960ms/step - loss: 0.0724 - acc: 0.9757 - f1_m: 0.9762 - val_loss: 0.4568 - val_acc: 0.9009 - val_f1_m: 0.9025

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 16/20
250/250 [==============================] - 240s 960ms/step - loss: 0.0701 - acc: 0.9778 - f1_m: 0.9771 - val_loss: 0.4242 - val_acc: 0.9124 - val_f1_m: 0.9149
Epoch 17/20
250/250 [==============================] - 239s 957ms/step - loss: 0.0569 - acc: 0.9799 - f1_m: 0.9799 - val_loss: 0.4275 - val_acc: 0.9054 - val_f1_m: 0.9080

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 18/20
250/250 [==============================] - 239s 957ms/step - loss: 0.0563 - acc: 0.9806 - f1_m: 0.9807 - val_loss: 0.4138 - val_acc: 0.9129 - val_f1_m: 0.9157
Epoch 19/20
250/250 [==============================] - 240s 960ms/step - loss: 0.0394 - acc: 0.9864 - f1_m: 0.9864 - val_loss: 0.4088 - val_acc: 0.9119 - val_f1_m: 0.9135
Epoch 20/20
250/250 [==============================] - 239s 956ms/step - loss: 0.0407 - acc: 0.9867 - f1_m: 0.9867 - val_loss: 0.4164 - val_acc: 0.9134 - val_f1_m: 0.9143

Epoch 00020: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.

Optimizer 4: SGD

In [16]:
#compile
model_sgd = get_model()
model_sgd.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['acc',f1_m])
hist_sgd = model_sgd.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 272s 1s/step - loss: 5.2815 - acc: 0.0070 - f1_m: 0.0000e+00 - val_loss: 5.2558 - val_acc: 0.0100 - val_f1_m: 0.0000e+00
Epoch 2/20
250/250 [==============================] - 237s 947ms/step - loss: 5.2364 - acc: 0.0134 - f1_m: 0.0000e+00 - val_loss: 5.2147 - val_acc: 0.0195 - val_f1_m: 0.0000e+00
Epoch 3/20
250/250 [==============================] - 236s 945ms/step - loss: 5.1866 - acc: 0.0198 - f1_m: 0.0000e+00 - val_loss: 5.1626 - val_acc: 0.0425 - val_f1_m: 0.0000e+00
Epoch 4/20
250/250 [==============================] - 237s 948ms/step - loss: 5.1253 - acc: 0.0367 - f1_m: 0.0000e+00 - val_loss: 5.0876 - val_acc: 0.0571 - val_f1_m: 0.0000e+00
Epoch 5/20
250/250 [==============================] - 236s 945ms/step - loss: 5.0311 - acc: 0.0557 - f1_m: 0.0000e+00 - val_loss: 4.9584 - val_acc: 0.0771 - val_f1_m: 0.0000e+00
Epoch 6/20
250/250 [==============================] - 236s 945ms/step - loss: 4.8801 - acc: 0.0801 - f1_m: 0.0000e+00 - val_loss: 4.7433 - val_acc: 0.0946 - val_f1_m: 0.0000e+00
Epoch 7/20
250/250 [==============================] - 236s 944ms/step - loss: 4.6674 - acc: 0.1102 - f1_m: 0.0000e+00 - val_loss: 4.5005 - val_acc: 0.1296 - val_f1_m: 0.0000e+00
Epoch 8/20
250/250 [==============================] - 237s 949ms/step - loss: 4.4327 - acc: 0.1355 - f1_m: 7.2728e-04 - val_loss: 4.2099 - val_acc: 0.1702 - val_f1_m: 0.0000e+00
Epoch 9/20
250/250 [==============================] - 238s 952ms/step - loss: 4.1743 - acc: 0.1830 - f1_m: 4.8485e-04 - val_loss: 3.9078 - val_acc: 0.2157 - val_f1_m: 0.0067
Epoch 10/20
250/250 [==============================] - 237s 948ms/step - loss: 3.8609 - acc: 0.2226 - f1_m: 0.0057 - val_loss: 3.5655 - val_acc: 0.2688 - val_f1_m: 0.0125
Epoch 11/20
250/250 [==============================] - 265s 1s/step - loss: 3.5310 - acc: 0.2820 - f1_m: 0.0152 - val_loss: 3.2231 - val_acc: 0.3268 - val_f1_m: 0.0258
Epoch 12/20
250/250 [==============================] - 269s 1s/step - loss: 3.2108 - acc: 0.3311 - f1_m: 0.0350 - val_loss: 2.8546 - val_acc: 0.4124 - val_f1_m: 0.0495
Epoch 13/20
250/250 [==============================] - 265s 1s/step - loss: 2.8602 - acc: 0.3960 - f1_m: 0.0636 - val_loss: 2.5056 - val_acc: 0.4700 - val_f1_m: 0.1007
Epoch 14/20
250/250 [==============================] - 246s 984ms/step - loss: 2.5567 - acc: 0.4545 - f1_m: 0.1146 - val_loss: 2.2381 - val_acc: 0.5215 - val_f1_m: 0.1474
Epoch 15/20
250/250 [==============================] - 303s 1s/step - loss: 2.2699 - acc: 0.4998 - f1_m: 0.1789 - val_loss: 1.9367 - val_acc: 0.5701 - val_f1_m: 0.2627
Epoch 16/20
250/250 [==============================] - 245s 981ms/step - loss: 2.0120 - acc: 0.5417 - f1_m: 0.2576 - val_loss: 1.7187 - val_acc: 0.6081 - val_f1_m: 0.3514
Epoch 17/20
250/250 [==============================] - 244s 976ms/step - loss: 1.7865 - acc: 0.5907 - f1_m: 0.3439 - val_loss: 1.5097 - val_acc: 0.6411 - val_f1_m: 0.4685
Epoch 18/20
250/250 [==============================] - 241s 965ms/step - loss: 1.6084 - acc: 0.6243 - f1_m: 0.4144 - val_loss: 1.3484 - val_acc: 0.6792 - val_f1_m: 0.5331
Epoch 19/20
250/250 [==============================] - 245s 978ms/step - loss: 1.4419 - acc: 0.6577 - f1_m: 0.5014 - val_loss: 1.2364 - val_acc: 0.6937 - val_f1_m: 0.5887
Epoch 20/20
250/250 [==============================] - 244s 976ms/step - loss: 1.3128 - acc: 0.6885 - f1_m: 0.5429 - val_loss: 1.0948 - val_acc: 0.7192 - val_f1_m: 0.6461

Optimizer 5: SGD + Nesterov

In [17]:
#compile
model_sgdnes = get_model()
model_sgdnes.compile(loss='categorical_crossentropy', optimizer=SGD(nesterov=True), metrics=['acc',f1_m])
hist_sgdnes = model_sgdnes.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 287s 1s/step - loss: 5.2796 - acc: 0.0066 - f1_m: 0.0000e+00 - val_loss: 5.2519 - val_acc: 0.0060 - val_f1_m: 0.0000e+00
Epoch 2/20
250/250 [==============================] - 257s 1s/step - loss: 5.2392 - acc: 0.0112 - f1_m: 0.0000e+00 - val_loss: 5.2110 - val_acc: 0.0195 - val_f1_m: 0.0000e+00
Epoch 3/20
250/250 [==============================] - 253s 1s/step - loss: 5.1897 - acc: 0.0218 - f1_m: 0.0000e+00 - val_loss: 5.1554 - val_acc: 0.0385 - val_f1_m: 0.0000e+00
Epoch 4/20
250/250 [==============================] - 250s 1s/step - loss: 5.1149 - acc: 0.0385 - f1_m: 0.0000e+00 - val_loss: 5.0658 - val_acc: 0.0591 - val_f1_m: 0.0000e+00
Epoch 5/20
250/250 [==============================] - 246s 983ms/step - loss: 5.0154 - acc: 0.0540 - f1_m: 0.0000e+00 - val_loss: 4.9082 - val_acc: 0.0711 - val_f1_m: 0.0000e+00
Epoch 6/20
250/250 [==============================] - 245s 978ms/step - loss: 4.8579 - acc: 0.0828 - f1_m: 0.0000e+00 - val_loss: 4.7131 - val_acc: 0.1206 - val_f1_m: 0.0000e+00
Epoch 7/20
250/250 [==============================] - 246s 984ms/step - loss: 4.6758 - acc: 0.1168 - f1_m: 0.0000e+00 - val_loss: 4.4955 - val_acc: 0.1491 - val_f1_m: 9.7067e-04
Epoch 8/20
250/250 [==============================] - 248s 991ms/step - loss: 4.4442 - acc: 0.1498 - f1_m: 7.2728e-04 - val_loss: 4.2417 - val_acc: 0.1807 - val_f1_m: 0.0019
Epoch 9/20
250/250 [==============================] - 244s 977ms/step - loss: 4.1850 - acc: 0.1889 - f1_m: 0.0032 - val_loss: 3.9157 - val_acc: 0.2162 - val_f1_m: 0.0087
Epoch 10/20
250/250 [==============================] - 244s 978ms/step - loss: 3.8912 - acc: 0.2210 - f1_m: 0.0082 - val_loss: 3.5765 - val_acc: 0.2613 - val_f1_m: 0.0222
Epoch 11/20
250/250 [==============================] - 245s 979ms/step - loss: 3.5922 - acc: 0.2692 - f1_m: 0.0211 - val_loss: 3.2315 - val_acc: 0.3473 - val_f1_m: 0.0398
Epoch 12/20
250/250 [==============================] - 244s 975ms/step - loss: 3.2267 - acc: 0.3395 - f1_m: 0.0428 - val_loss: 2.8834 - val_acc: 0.3809 - val_f1_m: 0.0714
Epoch 13/20
250/250 [==============================] - 241s 962ms/step - loss: 2.9179 - acc: 0.3876 - f1_m: 0.0712 - val_loss: 2.5499 - val_acc: 0.4665 - val_f1_m: 0.1178
Epoch 14/20
250/250 [==============================] - 240s 962ms/step - loss: 2.6022 - acc: 0.4465 - f1_m: 0.1123 - val_loss: 2.2473 - val_acc: 0.4995 - val_f1_m: 0.1788
Epoch 15/20
250/250 [==============================] - 242s 967ms/step - loss: 2.2877 - acc: 0.5069 - f1_m: 0.1890 - val_loss: 1.9782 - val_acc: 0.5651 - val_f1_m: 0.2664
Epoch 16/20
250/250 [==============================] - 239s 956ms/step - loss: 2.0582 - acc: 0.5426 - f1_m: 0.2587 - val_loss: 1.7474 - val_acc: 0.6166 - val_f1_m: 0.3598
Epoch 17/20
250/250 [==============================] - 238s 953ms/step - loss: 1.8387 - acc: 0.5868 - f1_m: 0.3370 - val_loss: 1.5692 - val_acc: 0.6261 - val_f1_m: 0.4423
Epoch 18/20
250/250 [==============================] - 238s 953ms/step - loss: 1.6553 - acc: 0.6237 - f1_m: 0.4035 - val_loss: 1.3760 - val_acc: 0.6867 - val_f1_m: 0.5301
Epoch 19/20
250/250 [==============================] - 243s 973ms/step - loss: 1.4744 - acc: 0.6533 - f1_m: 0.4755 - val_loss: 1.2595 - val_acc: 0.6947 - val_f1_m: 0.5777
Epoch 20/20
250/250 [==============================] - 238s 950ms/step - loss: 1.3349 - acc: 0.6892 - f1_m: 0.5419 - val_loss: 1.1540 - val_acc: 0.7132 - val_f1_m: 0.6220

Optimizer 6: SGD with momentum=0.9

In [18]:
#compile
model_sgdmo = get_model()
model_sgdmo.compile(loss='categorical_crossentropy', optimizer=SGD(momentum=0.9), metrics=['acc',f1_m])
hist_sgdmo = model_sgdmo.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 283s 1s/step - loss: 5.0416 - acc: 0.0401 - f1_m: 2.4243e-04 - val_loss: 4.3300 - val_acc: 0.1086 - val_f1_m: 0.0048
Epoch 2/20
250/250 [==============================] - 239s 956ms/step - loss: 3.3459 - acc: 0.2551 - f1_m: 0.0784 - val_loss: 2.0053 - val_acc: 0.4870 - val_f1_m: 0.3179
Epoch 3/20
250/250 [==============================] - 239s 956ms/step - loss: 1.6440 - acc: 0.5638 - f1_m: 0.4653 - val_loss: 1.0241 - val_acc: 0.7127 - val_f1_m: 0.6995
Epoch 4/20
250/250 [==============================] - 240s 959ms/step - loss: 0.9140 - acc: 0.7432 - f1_m: 0.7190 - val_loss: 0.7338 - val_acc: 0.7863 - val_f1_m: 0.7882
Epoch 5/20
250/250 [==============================] - 240s 960ms/step - loss: 0.5982 - acc: 0.8249 - f1_m: 0.8143 - val_loss: 0.6174 - val_acc: 0.8153 - val_f1_m: 0.8172
Epoch 6/20
250/250 [==============================] - 241s 963ms/step - loss: 0.4158 - acc: 0.8783 - f1_m: 0.8722 - val_loss: 0.5371 - val_acc: 0.8393 - val_f1_m: 0.8425
Epoch 7/20
250/250 [==============================] - 243s 972ms/step - loss: 0.3157 - acc: 0.9051 - f1_m: 0.9025 - val_loss: 0.4755 - val_acc: 0.8624 - val_f1_m: 0.8650
Epoch 8/20
250/250 [==============================] - 247s 990ms/step - loss: 0.2449 - acc: 0.9260 - f1_m: 0.9221 - val_loss: 0.4640 - val_acc: 0.8654 - val_f1_m: 0.8708
Epoch 9/20
250/250 [==============================] - 246s 983ms/step - loss: 0.1910 - acc: 0.9423 - f1_m: 0.9406 - val_loss: 0.4344 - val_acc: 0.8689 - val_f1_m: 0.8755
Epoch 10/20
250/250 [==============================] - 246s 985ms/step - loss: 0.1569 - acc: 0.9554 - f1_m: 0.9538 - val_loss: 0.4055 - val_acc: 0.8849 - val_f1_m: 0.8893
Epoch 11/20
250/250 [==============================] - 248s 993ms/step - loss: 0.1352 - acc: 0.9590 - f1_m: 0.9587 - val_loss: 0.4028 - val_acc: 0.8854 - val_f1_m: 0.8894
Epoch 12/20
250/250 [==============================] - 248s 994ms/step - loss: 0.1191 - acc: 0.9658 - f1_m: 0.9635 - val_loss: 0.4213 - val_acc: 0.8949 - val_f1_m: 0.8963

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.
Epoch 13/20
250/250 [==============================] - 246s 984ms/step - loss: 0.0820 - acc: 0.9773 - f1_m: 0.9770 - val_loss: 0.3905 - val_acc: 0.8969 - val_f1_m: 0.8996
Epoch 14/20
250/250 [==============================] - 249s 996ms/step - loss: 0.0658 - acc: 0.9811 - f1_m: 0.9813 - val_loss: 0.3680 - val_acc: 0.9054 - val_f1_m: 0.9064
Epoch 15/20
250/250 [==============================] - 249s 995ms/step - loss: 0.0617 - acc: 0.9840 - f1_m: 0.9834 - val_loss: 0.3761 - val_acc: 0.9109 - val_f1_m: 0.9116

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 16/20
250/250 [==============================] - 249s 996ms/step - loss: 0.0509 - acc: 0.9876 - f1_m: 0.9875 - val_loss: 0.3563 - val_acc: 0.9084 - val_f1_m: 0.9099
Epoch 17/20
250/250 [==============================] - 249s 996ms/step - loss: 0.0487 - acc: 0.9862 - f1_m: 0.9865 - val_loss: 0.3637 - val_acc: 0.9119 - val_f1_m: 0.9157

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 18/20
250/250 [==============================] - 249s 997ms/step - loss: 0.0405 - acc: 0.9903 - f1_m: 0.9903 - val_loss: 0.3531 - val_acc: 0.9084 - val_f1_m: 0.9126
Epoch 19/20
250/250 [==============================] - 246s 982ms/step - loss: 0.0383 - acc: 0.9907 - f1_m: 0.9902 - val_loss: 0.3529 - val_acc: 0.9119 - val_f1_m: 0.9143
Epoch 20/20
250/250 [==============================] - 248s 993ms/step - loss: 0.0402 - acc: 0.9912 - f1_m: 0.9902 - val_loss: 0.3530 - val_acc: 0.9129 - val_f1_m: 0.9152

Epoch 00020: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Optimizer 7: SGD + Nesterov with momentum=0.9

In [19]:
#compile
model_sgdmones = get_model()
model_sgdmones.compile(loss='categorical_crossentropy', optimizer=SGD(momentum=0.9, nesterov=True), metrics=['acc',f1_m])
hist_sgdmones  = model_sgdmones.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 295s 1s/step - loss: 5.0288 - acc: 0.0442 - f1_m: 0.0000e+00 - val_loss: 4.3071 - val_acc: 0.1191 - val_f1_m: 0.0029
Epoch 2/20
250/250 [==============================] - 243s 974ms/step - loss: 3.2497 - acc: 0.2771 - f1_m: 0.0977 - val_loss: 1.9375 - val_acc: 0.5000 - val_f1_m: 0.3884
Epoch 3/20
250/250 [==============================] - 243s 971ms/step - loss: 1.6001 - acc: 0.5817 - f1_m: 0.4869 - val_loss: 1.0060 - val_acc: 0.7187 - val_f1_m: 0.6957
Epoch 4/20
250/250 [==============================] - 242s 969ms/step - loss: 0.8932 - acc: 0.7499 - f1_m: 0.7255 - val_loss: 0.7660 - val_acc: 0.7633 - val_f1_m: 0.7622
Epoch 5/20
250/250 [==============================] - 242s 968ms/step - loss: 0.5922 - acc: 0.8276 - f1_m: 0.8198 - val_loss: 0.5647 - val_acc: 0.8388 - val_f1_m: 0.8374
Epoch 6/20
250/250 [==============================] - 244s 976ms/step - loss: 0.4195 - acc: 0.8742 - f1_m: 0.8700 - val_loss: 0.4960 - val_acc: 0.8504 - val_f1_m: 0.8542
Epoch 7/20
250/250 [==============================] - 246s 984ms/step - loss: 0.3148 - acc: 0.9049 - f1_m: 0.9016 - val_loss: 0.4507 - val_acc: 0.8654 - val_f1_m: 0.8686
Epoch 8/20
250/250 [==============================] - 246s 985ms/step - loss: 0.2435 - acc: 0.9255 - f1_m: 0.9200 - val_loss: 0.4225 - val_acc: 0.8759 - val_f1_m: 0.8775
Epoch 9/20
250/250 [==============================] - 241s 965ms/step - loss: 0.1873 - acc: 0.9448 - f1_m: 0.9420 - val_loss: 0.4541 - val_acc: 0.8724 - val_f1_m: 0.8738

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.
Epoch 10/20
250/250 [==============================] - 242s 966ms/step - loss: 0.1308 - acc: 0.9635 - f1_m: 0.9620 - val_loss: 0.3869 - val_acc: 0.8914 - val_f1_m: 0.8962
Epoch 11/20
250/250 [==============================] - 242s 966ms/step - loss: 0.1028 - acc: 0.9702 - f1_m: 0.9688 - val_loss: 0.3656 - val_acc: 0.8984 - val_f1_m: 0.8987
Epoch 12/20
250/250 [==============================] - 246s 983ms/step - loss: 0.0974 - acc: 0.9715 - f1_m: 0.9713 - val_loss: 0.3748 - val_acc: 0.8959 - val_f1_m: 0.8994

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 13/20
250/250 [==============================] - 244s 975ms/step - loss: 0.0729 - acc: 0.9827 - f1_m: 0.9819 - val_loss: 0.3537 - val_acc: 0.9109 - val_f1_m: 0.9140
Epoch 14/20
250/250 [==============================] - 243s 973ms/step - loss: 0.0718 - acc: 0.9801 - f1_m: 0.9798 - val_loss: 0.3420 - val_acc: 0.9064 - val_f1_m: 0.9116
Epoch 15/20
250/250 [==============================] - 243s 972ms/step - loss: 0.0595 - acc: 0.9857 - f1_m: 0.9845 - val_loss: 0.3502 - val_acc: 0.9094 - val_f1_m: 0.9162

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 16/20
250/250 [==============================] - 242s 968ms/step - loss: 0.0550 - acc: 0.9859 - f1_m: 0.9858 - val_loss: 0.3450 - val_acc: 0.9079 - val_f1_m: 0.9125

Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 00016: early stopping

acc / loss Plot

train acc

In [20]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['acc'])  
plt.plot(hist_adam.history['acc'])  
plt.plot(hist_nadam.history['acc']) 
plt.plot(hist_sgd.history['acc']) 
plt.plot(hist_sgdnes.history['acc']) 
plt.plot(hist_sgdmo.history['acc'])
plt.plot(hist_sgdmones.history['acc'])
plt.title('train. accuracy')  
plt.ylabel('accuracy')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='lower right')  

plt.show()

train loss

In [21]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['loss'])  
plt.plot(hist_adam.history['loss'])  
plt.plot(hist_nadam.history['loss']) 
plt.plot(hist_sgd.history['loss']) 
plt.plot(hist_sgdnes.history['loss']) 
plt.plot(hist_sgdmo.history['loss'])
plt.plot(hist_sgdmones.history['loss'])
plt.title('train. loss')  
plt.ylabel('loss')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

valid acc

In [22]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['val_acc'])
plt.plot(hist_adam.history['val_acc'])
plt.plot(hist_nadam.history['val_acc'])
plt.plot(hist_sgd.history['val_acc'])
plt.plot(hist_sgdnes.history['val_acc'])
plt.plot(hist_sgdmo.history['val_acc'])
plt.plot(hist_sgdmones.history['val_acc'])

plt.title('valid. accuracy')  
plt.ylabel('accuracy')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='lower right')  

plt.show()

valid loss

In [23]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['val_loss'])  
plt.plot(hist_adam.history['val_loss'])  
plt.plot(hist_nadam.history['val_loss']) 
plt.plot(hist_sgd.history['val_loss']) 
plt.plot(hist_sgdnes.history['val_loss']) 
plt.plot(hist_sgdmo.history['val_loss'])
plt.plot(hist_sgdmones.history['val_loss'])
plt.title('valid. loss')  
plt.ylabel('loss')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

train f1 score

In [24]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['f1_m'])  
plt.plot(hist_adam.history['f1_m'])  
plt.plot(hist_nadam.history['f1_m']) 
plt.plot(hist_sgd.history['f1_m']) 
plt.plot(hist_sgdnes.history['f1_m']) 
plt.plot(hist_sgdmo.history['f1_m'])
plt.plot(hist_sgdmones.history['f1_m'])
plt.title('train. f1_score')  
plt.ylabel('f1_score')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

valid f1 score

In [25]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['val_f1_m'])  
plt.plot(hist_adam.history['val_f1_m'])  
plt.plot(hist_nadam.history['val_f1_m']) 
plt.plot(hist_sgd.history['val_f1_m']) 
plt.plot(hist_sgdnes.history['val_f1_m']) 
plt.plot(hist_sgdmo.history['val_f1_m'])
plt.plot(hist_sgdmones.history['val_f1_m'])
plt.title('valid. f1_score')  
plt.ylabel('f1_score')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

결론

  • 'sgd', 'sgd+nesterov' 는 너무 늦게 수렴하여 comp에 적절한 optimizer는 아닌것 같습니다.
  • 'rmsprop', 'adam'이 비교적으로 빠른 시간안에 높은 acc에 도달합니다.
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형

'competition' 카테고리의 다른 글

14th solution - 9%  (0) 2019.11.07
Pseudo Labelling  (0) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
kaggle Top8% (681th of 8802) 🥉  (0) 2019.10.17
kaggle Top6% (95th of 1836)🥉  (0) 2019.10.17
반응형

기본적으로 Boosting Tree모델은 하나의 변수를 바라볼 때 수직으로만 볼 수있습니다. 예를들어 몸무게에 따른 당뇨병의 확률을 예측할 때 Boosting Tree모델은 몸무게> 100kg 일 때 당뇨확률이 높다. 몸무게 < 60kg 일 때 당뇨확률이 낮다 정도의 수직분할만 할 수 있습니다. 그렇다면 Boosting 모델이 수직뿐만 아니라 수평적으로 보기 위해서는 어떻게 해야할까요? 바로 Frequency Encoding을 통해 부스팅 모델은 수직분할뿐 아니라 수평분할까지 가능합니다. 아래의 예시를 보겠습니다.

참조 : https://www.kaggle.com/cdeotte/200-magical-models-santander-0-920 

 LGBM은 Var_198에 대한 히스토그램을 볼 때 Var_198 < 13 일때 target = 1일 확률이 높다고 생각합니다. 그리고 Var_198 > 13 일때는 낮다고 생각합니다. LGBM은 기본적으로 Var_198 < 13 인 대상은 target = 0.18, 반대는 0.10이라고 예측합니다.

참조 : https://www.kaggle.com/cdeotte/200-magical-models-santander-0-920 

LGBM은 수직으로 히스토그램을 나눕니다 왜냐하면 LGBM은 수평의 차이는 볼 수 없기 때문입니다. 히스토그램에서 어떤 값은 유일한 값을 가지고, 또 어떤 값은 여러 번 발생합니다. 예를 들어 Var_108의 경우에 어떤 값은 300번 넘게 발생합니다. 왼쪽의 히스토그램은 변수 Var_198의 11.0000<x<11.1000사이의 구간 사이의 값을 count한 그림입니다. 11.0712는 5번 11.0712근처의 값인 11.0720은 한번 발생했다는것을 볼 수 있습니다.

 

참조 : https://www.kaggle.com/cdeotte/200-magical-models-santander-0-920 

Frequency Encoding을 추가했습니다. Var_198에 대해 counts변수를 생성한 후 위에 그림을 다시 그려보면 LGBM이 counts변수를 가질 때 히스토그램을 수직뿐만 아니라 수평적으로 볼 수 있습니다. 이제 counts변수를 통해 LGBM은 Var_198 < 13 AND count=1일 때 target = 0.1로 예측하고 Var_198 < 13 AND count > 1 일때 target = 0.36으로 예측합니다.

이 방법이 AUC를 상승시킵니다.

반응형

'competition' 카테고리의 다른 글

Pseudo Labelling  (0) 2019.11.06
Compare optimizer of efficientNet  (2) 2019.11.06
kaggle Top8% (681th of 8802) 🥉  (0) 2019.10.17
kaggle Top6% (95th of 1836)🥉  (0) 2019.10.17
[kaggle] Adversarial validation part1  (0) 2019.06.11
반응형

Summary of Santander-Customer-Transaction-Prediction

 

kaggle Top8% (681th of 8802) 🥉

image

useful

Smote+XGboost [Public Score= 0.75161]

  • Target 반응변수 비율이 0.1밖에 되지 않아 SmoteOver Sampling 적용후 xgboost 모델링

XGboost [Public Score = 0.88411]

  • Smote 성능이 낮아 Xgboost 모델링만 시도

XGboost_tuning [Public Score = 0.89692]

  • max_depth(최대 트리깊이)= 3 -> 2, colsample_bytree(트리마다 변수선택비율)=1 -> 0.3, learning_rate(학습률)=0.05 ->0.02

Lgboost [Public Score = 0.89766]

  • Lgboost가 XGgboost에 비해 매우 빠르고, 그로인해 parameter tuninig이 용이함

Ensemble Models(XGboost + Lgboost) [Public Score = 0.90043]

  • 비선형 모델NuSVC에 StandardScaler 적용

Lgboost_oof [Public Score = 0.90043]

  • Out of fold로 Public Score 0.003 상승

Lgboost_oof_augment [Public Score = 0.90060]

  • Augment(데이터 증식)을 통해 Public Score 0.00017, cv score 0.00061 상승

Lgboost_oof_frequency [Public Score = 0.90119]

  • Frequency변수 추가로 약0.005의 cv score가 증가했지만 publi는 미세하게 증가하였다.

1% solution[private = 0.92159]

  • remove fake from test
  • concat train and test
  • frequency encoding
  • 200 model train and predict(LGB)

try

  • Sum, std, mean, min, max, skew, kurtosis, etc for row
  • Augment
  • Random over sampling ,SMOTE over samplint
  • Round1, Round2, Round3
  • Frequency encoding – solutio이었지만 fake test set을 발견하지 못 함
  • var_0 – (var_1, var_2, … var_199), var_1 – (var_0, var_2, … var_199)을 반복하며 time series 변수 찾아봄
  • Binning 을 이용해 catecory 적용
  • Eda를 통해 useless해 보이는 컬럼제거
  • Feature importance top5변수의 상호작용(+, x, etc) 컬럼
  • PCA
  • Clustering
  • NN model ensamble

Learning

  • OOF(Out Of Fold)

    image
  • Agument

    It means that there is no interaction between variables (the variables are completely independent). Here are some examples. If the variables did have interactions, then you might need var_0=5 AND var_1=2.5 AND var_2=15 for target=1.If only one of those 3 occurred you would have target=0 but if all three occurred, you would have target=1. That's interaction. If there was interaction, you cannot train on shuffled columns because you would destroy the interactions that are present in the training data and your model could not learn them. But with this Santander data, there are no interactions, so you can shuffle columns and make new data.

  • Fake data

    Removing fake samples is the key of this competition

  • Frequency Encoding

    Frequency Encoding works even if continuous variable

  • how did adding these 200 new/duplicated features improve the score in a tree model (LightGBM)?

    It turns out LGBM benefits from helping it find interactions too.Instead of waiting for a decision tree to send all the observations with count=1 to the left and count>1to the right. You just give it a new feature with all the count=1 values removed (converted to NAN). Instead of waiting for the decision tree to send count <= 2 to the left and count>2 to the right. You just give it a new feature with all the count<=2 values removed. (Additionally adding new columns like var_x times var_x_FE also helps LGBM find interactions)

반응형

'competition' 카테고리의 다른 글

Compare optimizer of efficientNet  (2) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
kaggle Top6% (95th of 1836)🥉  (0) 2019.10.17
[kaggle] Adversarial validation part1  (0) 2019.06.11
make_classification(데이터 만들기)  (0) 2019.06.11

+ Recent posts