반응형
vggnet-using-keras

VGGNet

image

  • 이번에는 VGGNet에 대하여 알아보겠습니다. VGGNet은 14년도 ImageNet 대회에서 2등을 하고 그해 localization대회에서 1등을 차지했습니다.VGGNet의 특징은 더 깊어지고 더 작은 필터를 사용했다는것입니다. 이웃픽셀을 포함할 수 있는 가장작은 필터인 3x3 필터를 사용했습니다. 그리고 작은 필터를 유지하고 주기적으로 pooling을 수행했습니다.
  • 왜 작은 필터를 사용할까요? 우선 필터의 크기가 작으면 파라미터의 수가 적고 depth를 더 키울 수 있습니다. 3x3 with stride1 CONV를 3번 쌓게 되면 effective receptive field는 무엇일까요? 정답은 7x7입니다. 왜 이렇게 되는지 살펴보겠습니다. 첫 번째 layer에서 하나의 픽셀은 CONV를 통해 3x3의 정보를 가지게 됩니다 두 번째 레이어에서 하나의 픽셀은 3x3의 정보를 가진 상태에서 또다시 3x3의 CONV를 통해 5x5의 정보를 가집니다. 이런식으로 3x3의 CONV를 3번 쌓게되면 7x7의 effective receptive field 가지게 됩니다. 이 부분이 VGGNet이 AlexNet 보다 깊게 층을 쌓을 수 있었던 이유입니다.

Model - VGG16

  • VGGNet 뒤의 숫자는 layer의 수를 나타 냅니다 image

ImageNet weight를 가진 VGG16 불러오기

In [1]:
from keras.applications.vgg16 import VGG16
import numpy as np
#include_top은 네트워크의 최상단에 완전 연결 레이어를 넣을지 여부입니다..
model = VGG16(weights='imagenet', include_top=True,  input_shape=(224, 224, 3))
model.summary()
Using TensorFlow backend.
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5
553467904/553467096 [==============================] - 36s 0us/step
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

VGG16 구현하기

  • 쉽게 구현할 수 있습니다.
  • 오직 3X3 CONV with stride1, pad1 그리고 2X2 MAX POOL with stride2 로 구성되어있습니다
In [2]:
from keras.layers import Input, Conv2D, MaxPooling2D
from keras.layers import Dense, Flatten
from keras.models import Model, Sequential

input_shape = (224, 224, 3)
model = Sequential()
#conv1-1,1-2
model.add(Conv2D(filters=64, kernel_size=(3,3), strides = 1, padding="same", activation="relu",input_shape=input_shape))
model.add(Conv2D(filters=64, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
#MAXPOOL1
model.add(MaxPooling2D(pool_size=(2, 2), strides = 2))

#conv2-1, 2-2
model.add(Conv2D(filters=128, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
#MAXPOOL2
model.add(MaxPooling2D(pool_size=(2, 2), strides = 2))

#conv3-1, 3-2, 3-3
model.add(Conv2D(filters=256, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
#MAXPOOL3
model.add(MaxPooling2D(pool_size=(2, 2), strides = 2))

#conv4-1, 4-2, 4-3
model.add(Conv2D(filters=512, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
#MAXPOOL4
model.add(MaxPooling2D(pool_size=(2, 2), strides = 2))

#conv5-1, 5-2, 5-3
model.add(Conv2D(filters=512, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), strides = 1, padding="same", activation="relu"))
#MAXPOOL5
model.add(MaxPooling2D(pool_size=(2, 2), strides = 2))

model.add(Flatten())
#FC6, 7, 8
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(1000, activation='softmax'))
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 56, 56, 256)       295168    
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 56, 56, 256)       590080    
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 56, 56, 256)       590080    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 28, 28, 256)       0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 28, 28, 512)       1180160   
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 28, 28, 512)       2359808   
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 28, 28, 512)       2359808   
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 14, 14, 512)       0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 14, 14, 512)       2359808   
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 7, 7, 512)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              102764544 
_________________________________________________________________
dense_2 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dense_3 (Dense)              (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

Model - VGG19

  • vgg19는 keras applications을 통해서만 가져오겠습니다.
  • 보시면 VGG16보다 Pooling사이에 CONV가 한층 더 추가된걸로 보입니다.
  • effective receptive field를 통해 params수는 VGG16에 비해 많이 증가되지 않았습니다.

ImageNet weight를 가진 VGG19 불러오기

In [3]:
from keras.applications.vgg19 import VGG19
#include_top은 네트워크의 최상단에 완전 연결 레이어를 넣을지 여부입니다..
model = VGG19(weights='imagenet', include_top=True,  input_shape=(224, 224, 3))
model.summary()
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5
574717952/574710816 [==============================] - 37s 0us/step
Model: "vgg19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
_________________________________________________________________
In [2]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형
반응형

LRN(Local Response Normalization)

LRN(Local Response Normalization)은 현재는 많이 사용되지 않습니다. 그러나 Image Net에서 최초의 CNN우승 모델인 AlexNet에서 사용했으며 작동방식에 대해 알아보도록 하겠습니다. 

 

LRN을 검색해보면 Local Response Normalization (LRN) layer implements the lateral inhibition한다고 나와있습니다.

lateral inhibition이란 무었일까요? 

 

"측면 억제(lateral inhibition)는 신경생리학 용어로, 한 영역에 있는 신경 세포가 상호 간 연결되어 있을 때 한 그 자신의 축색이나 자신과 이웃 신경세포를 매개하는 중간신경세포(interneuron)를 통해 이웃에 있는 신경 세포를 억제하려는 경향이다".

잘...이해가 되지를 않습니다. 우선 이 그림을 한번 보겠습니다. 측면 억제의 유명한 그림인 헤르만 격자입니다.

hermann-grid

검은사각형안에 흰색의 선이 지나가고 있습니다. 신기한 것은 흰색의 선에 집중하지 않을 때 회색의 점이 보이는데 이러한 현상이 측면 억제(lateral inhibition)에 의해 발생하는 것 입니다. 이는 흰색으로 둘러싸인 측면에서 억제를 발생시키기 때문에 흰색이 더 반감되어 보입니다.

 

다시 AlexNet의 LRN(Local Response Normalization)로 돌아오겠습니다.  

그렇다면 AlexNet은 왜 측면 억제(lateral inhibition)를 사용했을까요? 바로 ReLU의 사용때문입니다.ReLU는 양수의 방향으로는 입력의값을 그대로 사용합니다.그렇게되면 CONV나 POOLING시 매우 높은 하나의 픽셀값이 주변의 픽셀에 영향을 미치게되겠죠? 이런 부분을 방지하기 위해 다른 ActivationMap의 같은 위치에있는 픽셀끼리 정규화를해줍니다. 이것이바로 AlexNet에서사용되는 LRN이죠.

 

텐서플로우에서 LRN을 사용하려면 tf.nn.local_response_normalization을 통해 사용할 수 있지만 현재는 Batch Normalization을 사용합니다. 조금 더 깊게 공부하고 싶으신분은 아래 공식을 보시면 될 것 같습니다.

  LRN(Local Response Normalization)

 

반응형

'데이터분석 > vision' 카테고리의 다른 글

VGGNet using keras  (0) 2019.11.07
AlexNet using keras  (0) 2019.11.06
LeNet-5 using keras  (0) 2019.10.30
Lecture 9: CNN Architectures  (0) 2019.10.27
Lecture 7: Training Neural Networks, part I  (0) 2019.10.17
반응형

 

 

 

 
3rd ML Month - 14th solution

image
 

14th solution - CNN-Stacking

Model composition Notebooks

EfficientNetB4[Public Score = 0.95600] 
 - Model training(v143, v144, v145, v146, v147)
Xception[Public Score = 0.94787]
 - Model training(v230, v231, v233, v234, v239)
Resnet50[Public Score = 0.92682]
 - Model training(v247, 249, v249)

CNN structure

Input(shape=(None, number of models, number of classes,1))    
Conv2D(filters=8,  kernelsize=(1, 1)) 
Conv2D(filters=16, kernelsize=(1, 1)) 
Dropout(0.5)
Conv2D(filters=32, kernelsize=(2, 1)) 
Dropout(0.5)
Conv2D(filters=64, kernelsize=(2, 1)) 
Dropout(0.5)
Flatten() 
Dense(1024) 
Dense(196, activation='softmax')

5-fold

  • validation set shape - (14594, 3, 196, 1)
  • Training this model with Kfold (k=5) on the validation set,
  • Optimizer - AdamAccumulate[Public Score = 0.95791]
  • Optimizer = adam [Public Score = 0.96174]
 

Notebooks update

Version1[Public Score = 0.811]

  1. EfficientNet_basic

Version3[Public Score = 0.828]

  1. Data Augmentation

Version8[Public Score = 0.924]

  1. Callbacklist(EarlyStopping, ReduceLROnPlateau, ModelCheckpoint)
  2. ayers.Dense(1024) -> layers.Dense(2048)

Version9[Public Score = 0.950] - Weighted average ensemble through cross validation

1. **fold1** : [Public 0.903]
2. **fold2** : [Public 0.923]
3. **fold3** : [Public 0.916]
4. **fold4** : [Public 0.926]
5. **fold5** : [Public 0.926]
- **Ensemble** [Public 0.950]

Version19[Public Score = 0.951]

  1. 5fold -> 6fold

Version25[Public Score = 0.954]

  1. Cutout Augmentation
  2. TTA

Version122[Public Score = 0.955]

  1. Model(EfficientNetB3 -> EfficientNetB4)
  2. imgsize(299 -> 380)
  3. Version119, Version120 Model training

Version135[Public Score = 0.956] - Semi-supervised Learning

  1. Pseudo Label dataset
  2. Version132, Version133 Model training
  3. Keras Semi-supervised Learning Notebooks

Version135[Public Score = 0.958]

  1. Transfer Learning,
  2. Optimizer( adam -> AdamAccumulate)

Version314[Public Score = 0.961] - CNN-stacking

 

Try

1. xgboost-stacking[Public Score = 0.94882]

EfficientNetB4[Public Score = 0.95600] 
Xception[Public Score = 0.94787]
Resnet50[Public Score = 0.92682]

  1. 위 세개 모델의 output을 feature사용 train_shape - (14594, 588)
  2. if class ==1, 1 else 0 이런식으로 총 196번 훈련
  3. 5-fold를 사용해 학습
  4. predict 결과를 argmax를 통해 class할당
  • class 196개를 한번에 돌리면 학습이 안됨
  • 196번 훈련 및 예측으로 인하여 class가 두,세번 예측되어 196을 넘어가는 경우가 많음(f1 스코어를 통해 최적화하는 방법을 시도하지 못했습니다.

2. Mixup augmentation

  • Mixup 시도하였으나 f1스코어가 떨어졌습니다. class별 훈련데이터가 적어서 그렇지않을까 생각합니다. image

3. Cutout augmentation

  • Cutout augmentation이란 사진의 일정부분을 제거하는 데이터 증각기법입니다. image

3. Ensemble

EfficientNetB4[Public Score = 0.95600] 
Xception[Public Score = 0.94787]
Resnet50[Public Score = 0.92682]

위의 세개의 모델로 가중평균 앙상블 을 시도했습니다. Public Score 는 0.95695로 낮게 나왔지만, 최종 Private Score는 0.95663로 최종점수보다 약2% 높은 점수를 얻었습니다. 앙상블의 힘이 대단하다는 것을 다시 한번 느낍니다

 

Result

  • 최종적으로 156명중 14등을 기록했습니다. 앙상블의 힘을 다시한번 느꼈으며, CNN Stacking을 늦게 시도하여 과적합을 잡지 못한것이 아쉬움이 남습니다. image
In [2]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
 
 

반응형

'competition' 카테고리의 다른 글

[1st] Cassava Leaf Disease Classification -EDA  (0) 2021.01.10
DACON - 천체 유형 분류대회 후기(3th)  (0) 2020.05.02
Pseudo Labelling  (0) 2019.11.06
Compare optimizer of efficientNet  (2) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
반응형
alexnet-using-keras
In [1]:
import gc
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

# 교차검증 lib
from sklearn.model_selection import StratifiedKFold,train_test_split
from tqdm import tqdm_notebook
from sklearn.metrics import accuracy_score, roc_auc_score
#모델 lib
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator, load_img
from keras.models import Sequential, Model
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, AveragePooling2D,BatchNormalization, MaxPooling2D
from keras import layers
from keras.optimizers import Adam,RMSprop, SGD

#모델
from keras.applications import VGG16, VGG19, resnet50

#경고메세지 무시
import warnings
warnings.filterwarnings(action='ignore')

import os
import gc
Using TensorFlow backend.

AlexNet

  • 이번에는 AlexNet에 대하여 알아보겠습니다. AlexNet 12년에 등장해서 딥러닝을 부흥시켰습니다. 그이유는 최초로 ImageNet대회에서 CNN으로 우승했기 때문입니다. 구조는 LeNet-5와 비슷하고 레이어만 증가했습니다.

  • AlexNex이 개발되었을 시대에는 컴퓨팅파워가 낮아 3GB의 GPU밖에 없어 AlexNet으로 ImageNet을 학습시킬 수 없었습니다. 그래서 model parallel을 통해 두개의 GPU을 연결해 병렬로 처리했습니다. 그래서 자세히 그림을보시면 conv1에서 depth가 96이아니라 48입니다.

image.png

Data load

  • AlexNet을 통해 개와 고양이를 분류하는 이진분류 문제를 풀어보겠습니다.
In [2]:
# https://www.kaggle.com/bulentsiyah/dogs-vs-cats-classification-vgg16-fine-tuning
filenames = os.listdir("../input/dogs-vs-cats/train/train")
categories = []
for filename in filenames:
    category = filename.split('.')[0]
    if category == 'dog':
        categories.append(1)
    else:
        categories.append(0)

train = pd.DataFrame({
    'filename': filenames,
    'category': categories
})
train.head()
Out[2]:
filename category
0 dog.3984.jpg 1
1 cat.4062.jpg 0
2 cat.4106.jpg 0
3 cat.12098.jpg 0
4 dog.4663.jpg 1

Visualization

In [3]:
#Visualizing the data
sample = filenames[2]
image = load_img("../input/dogs-vs-cats/train/train/"+sample)
plt.imshow(image)
plt.show()

train/test data Split

In [4]:
train["category"] = train["category"].astype('str')


its = np.arange(train.shape[0])
train_idx, test_idx = train_test_split(its, train_size = 0.8, random_state=42)

df_train = train.iloc[train_idx, :]
X_test = train.iloc[test_idx, :]

its = np.arange(df_train.shape[0])
train_idx, val_idx = train_test_split(its, train_size = 0.8, random_state=42)
X_train = df_train.iloc[train_idx, :]
X_val = df_train.iloc[val_idx, :]

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)
X_train['category'].value_counts()
(16000, 2)
(4000, 2)
(5000, 2)
Out[4]:
1    8048
0    7952
Name: category, dtype: int64

AlexNet- Details/Retrospectives

  • 최초의 ReLU를 사용했으며
  • local response normalization사용하였는데 현재는 사용하지 않습니다. 현재는 Batch Normalizetion을 사용합니다.
  • heavy data augmentation
  • dropout 0.5
  • batch size 128
  • SGD Momentum 0.9
  • Learning rate은 1e-2시작하며 val accuracy가 증가하지 않으면 10으로 나누어 줍니다.
  • L2 weight decay 5e-4
In [5]:
# Parameter
image_size = 227
img_size = (image_size, image_size)
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
nb_test_samples = len(X_test)
epochs = 20
#batch size 128
batch_size =128

# Define Generator config
train_datagen =ImageDataGenerator(
    rescale=1./255,
    rotation_range=10,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
    )

val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
In [6]:
#generator
train_generator = train_datagen.flow_from_dataframe(
    dataframe=X_train, 
    directory="../input/dogs-vs-cats/train/train",
    x_col = 'filename',
    y_col = 'category',
    target_size = img_size,
    color_mode='rgb',
    class_mode='binary',
    batch_size=batch_size,
    seed=42
)

validation_generator = val_datagen.flow_from_dataframe(
    dataframe=X_val, 
    directory="../input/dogs-vs-cats/train/train",
    x_col = 'filename',
    y_col = 'category',
    target_size = img_size,
    color_mode='rgb',
    class_mode='binary',
    batch_size=batch_size,
)

test_generator = test_datagen.flow_from_dataframe(
    dataframe=X_test,
    directory="../input/dogs-vs-cats/train/train",
    x_col = 'filename',
    y_col=None,
    target_size= img_size,
    color_mode='rgb',
    class_mode=None,
    batch_size=batch_size,
    shuffle=False
)
Found 16000 validated image filenames belonging to 2 classes.
Found 4000 validated image filenames belonging to 2 classes.
Found 5000 validated image filenames.

image

Model - AlexNet

In [7]:
#INPUT
input_shape = (227, 227, 3)
model = Sequential()
#CONV1
model.add(Conv2D(96, (11, 11), strides=4,padding='valid', input_shape=input_shape))
#MAX POOL1
model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
#NORM1  Local response normalization 사용하였는데 현재는 사용하지 않습니다. 현재는 Batch Normalizetion을 사용합니다.
model.add(BatchNormalization())
#CONV2
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
#MAX POOL1
model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
#NORM2
model.add(BatchNormalization())
#CONV3
model.add(Conv2D(384, (3, 3),strides=1, activation='relu', padding='same'))
#CONV4
model.add(Conv2D(384, (3, 3),strides=1, activation='relu', padding='same'))
#CONV5
model.add(Conv2D(256, (3, 3),strides=1, activation='relu', padding='same'))
#MAX POOL3
model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
model.add(Flatten())
#FC6 예측 class가 적어 FC layer을 조정했습니다.
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
#FC7
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
#FC8 이진 분류이기 때문에 sigmoid
model.add(Dense(1, activation='sigmoid'))
# SGD Momentum 0.9, L2 weight decay 5e-4
optimizer =  SGD(lr=0.01, decay=5e-4, momentum=0.9)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer, metrics=['accuracy'])
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 27, 27, 96)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 27, 27, 96)        384       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 27, 27, 256)       221440    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 13, 13, 256)       1024      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 384)       885120    
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 13, 13, 384)       1327488   
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 13, 13, 256)       884992    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 6, 6, 256)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              9438208   
_________________________________________________________________
dropout_1 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               524800    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 513       
=================================================================
Total params: 13,318,913
Trainable params: 13,318,209
Non-trainable params: 704
_________________________________________________________________

Train / predict

Train

In [8]:
def get_steps(num_samples, batch_size):
    if (num_samples % batch_size) > 0 :
        return (num_samples // batch_size) + 1
    else :
        return num_samples // batch_size
In [9]:
%%time
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

#model path
MODEL_SAVE_FOLDER_PATH = './model/'
if not os.path.exists(MODEL_SAVE_FOLDER_PATH):
    os.mkdir(MODEL_SAVE_FOLDER_PATH)

model_path = MODEL_SAVE_FOLDER_PATH + 'AlexNet.hdf5'

patient = 5
callbacks_list = [
    # Learning rate 1e-2, reduced by 10  manually when val accuracy plateaus
    ReduceLROnPlateau(
        monitor = 'val_accuracy', 
        #콜백 호출시 학습률(lr)을 10으로 나누어줌
        factor = 0.1, 
        #5epoch 동안 val_accuracy가 상승하지 않으면 lr 조정
        patience = patient, 
        #최소학습률
        min_lr=0.00001,
        verbose=1,
        mode='min'
    ),
    ModelCheckpoint(
        filepath=model_path,
        monitor ='val_accuracy',
        # val_loss가 좋지 않으면 모델파일을 덮어쓰지 않는다
        save_best_only = True,
        verbose=1,
        mode='min') ]
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 534 µs
In [10]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
gc.collect()
Epoch 1/20
125/125 [==============================] - 231s 2s/step - loss: 0.6550 - accuracy: 0.6219 - val_loss: 0.6832 - val_accuracy: 0.5770

Epoch 00001: val_accuracy improved from inf to 0.57700, saving model to ./model/AlexNet.hdf5
Epoch 2/20
125/125 [==============================] - 213s 2s/step - loss: 0.5462 - accuracy: 0.7245 - val_loss: 0.6127 - val_accuracy: 0.7203

Epoch 00002: val_accuracy did not improve from 0.57700
Epoch 3/20
125/125 [==============================] - 216s 2s/step - loss: 0.4708 - accuracy: 0.7766 - val_loss: 0.4568 - val_accuracy: 0.7160

Epoch 00003: val_accuracy did not improve from 0.57700
Epoch 4/20
125/125 [==============================] - 213s 2s/step - loss: 0.3999 - accuracy: 0.8168 - val_loss: 0.3649 - val_accuracy: 0.7610

Epoch 00004: val_accuracy did not improve from 0.57700
Epoch 5/20
125/125 [==============================] - 212s 2s/step - loss: 0.3593 - accuracy: 0.8436 - val_loss: 0.5346 - val_accuracy: 0.7875

Epoch 00005: val_accuracy did not improve from 0.57700
Epoch 6/20
125/125 [==============================] - 212s 2s/step - loss: 0.3185 - accuracy: 0.8619 - val_loss: 0.3869 - val_accuracy: 0.8745

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0009999999776482583.

Epoch 00006: val_accuracy did not improve from 0.57700
Epoch 7/20
125/125 [==============================] - 211s 2s/step - loss: 0.2708 - accuracy: 0.8857 - val_loss: 0.3340 - val_accuracy: 0.8905

Epoch 00007: val_accuracy did not improve from 0.57700
Epoch 8/20
125/125 [==============================] - 211s 2s/step - loss: 0.2458 - accuracy: 0.8966 - val_loss: 0.1782 - val_accuracy: 0.9025

Epoch 00008: val_accuracy did not improve from 0.57700
Epoch 9/20
125/125 [==============================] - 212s 2s/step - loss: 0.2332 - accuracy: 0.9041 - val_loss: 0.1363 - val_accuracy: 0.9068

Epoch 00009: val_accuracy did not improve from 0.57700
Epoch 10/20
125/125 [==============================] - 210s 2s/step - loss: 0.2249 - accuracy: 0.9068 - val_loss: 0.1963 - val_accuracy: 0.9087

Epoch 00010: val_accuracy did not improve from 0.57700
Epoch 11/20
125/125 [==============================] - 212s 2s/step - loss: 0.2212 - accuracy: 0.9113 - val_loss: 0.1615 - val_accuracy: 0.8960

Epoch 00011: ReduceLROnPlateau reducing learning rate to 9.999999310821295e-05.

Epoch 00011: val_accuracy did not improve from 0.57700
Epoch 12/20
125/125 [==============================] - 214s 2s/step - loss: 0.2156 - accuracy: 0.9108 - val_loss: 0.3811 - val_accuracy: 0.9070

Epoch 00012: val_accuracy did not improve from 0.57700
Epoch 13/20
125/125 [==============================] - 211s 2s/step - loss: 0.2139 - accuracy: 0.9106 - val_loss: 0.3198 - val_accuracy: 0.9087

Epoch 00013: val_accuracy did not improve from 0.57700
Epoch 14/20
125/125 [==============================] - 210s 2s/step - loss: 0.2117 - accuracy: 0.9123 - val_loss: 0.1410 - val_accuracy: 0.9080

Epoch 00014: val_accuracy did not improve from 0.57700
Epoch 15/20
125/125 [==============================] - 208s 2s/step - loss: 0.2123 - accuracy: 0.9141 - val_loss: 0.2429 - val_accuracy: 0.9090

Epoch 00015: val_accuracy did not improve from 0.57700
Epoch 16/20
125/125 [==============================] - 208s 2s/step - loss: 0.2084 - accuracy: 0.9143 - val_loss: 0.2101 - val_accuracy: 0.9093

Epoch 00016: ReduceLROnPlateau reducing learning rate to 1e-05.

Epoch 00016: val_accuracy did not improve from 0.57700
Epoch 17/20
125/125 [==============================] - 206s 2s/step - loss: 0.2089 - accuracy: 0.9158 - val_loss: 0.0953 - val_accuracy: 0.9097

Epoch 00017: val_accuracy did not improve from 0.57700
Epoch 18/20
125/125 [==============================] - 206s 2s/step - loss: 0.2094 - accuracy: 0.9136 - val_loss: 0.3337 - val_accuracy: 0.9100

Epoch 00018: val_accuracy did not improve from 0.57700
Epoch 19/20
125/125 [==============================] - 205s 2s/step - loss: 0.2097 - accuracy: 0.9143 - val_loss: 0.1391 - val_accuracy: 0.9090

Epoch 00019: val_accuracy did not improve from 0.57700
Epoch 20/20
125/125 [==============================] - 211s 2s/step - loss: 0.2059 - accuracy: 0.9158 - val_loss: 0.2226 - val_accuracy: 0.9097

Epoch 00020: val_accuracy did not improve from 0.57700
Out[10]:
212

predict

In [11]:
%%time
test_generator.reset()
prediction = model.predict_generator(
    generator = test_generator,
    steps = get_steps(nb_test_samples, batch_size),
    verbose=1
)
print('Test accuracy : ', roc_auc_score(X_test['category'].astype('int'), prediction, average='macro'))
40/40 [==============================] - 21s 533ms/step
Test accuracy :  0.9668627839696784
CPU times: user 15.2 s, sys: 5.23 s, total: 20.4 s
Wall time: 21.4 s

acc / loss plot

In [12]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(len(acc))

plt.plot(epochs, acc, label='Training acc')
plt.plot(epochs, val_acc, label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.ylim(0.5,1)
plt.show()
In [13]:
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(epochs, loss, label='Training loss')
plt.plot(epochs, val_loss, label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.ylim(0,0.5)
plt.show()

Result

  • testset의 정확도는 96.6%나왔습니다.
  • loss는 점점 떨어지지만 조금 불안정해보입니다
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형
반응형
keras-semi-supervised-learning

3rd ML Month - Keras Semi-supervised Learning

배경

  • 이번 대회의 class는 196개로 매우 많습니다.
  • 훈련데이터셋을 class로 나누게 되면 데이터가 매우 적고 그러므로 더욱 더 많은 데이터가 있으면 잘 학습시킬 수 있지않을까? 생각했습니다.

Semi-supervised Learning

  • 그래서 Semi-supervised Learning 기법 중 Pseudo Labelling을 사용하려고 합니다.
  • Pseudo Labelling의 절차는 1) 훈련데이터를 통한 모델생성 2) 테스트데이터 예측 3) 훈련데이터에 확실하게 예측된 테스트데이터 추가 4) 결합데이터를 통한 모델생성 5) 테스트데이터 예측입니다.

reference

Package

In [1]:
import gc
import os
import warnings
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from tqdm import tqdm_notebook
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, GlobalAveragePooling2D
from keras import layers
from keras.optimizers import SGD, RMSprop


import os
print(os.listdir("../input"))
Using TensorFlow backend.
['car-crop2', 'car-crop', 'semi-detaset', '2019-3rd-ml-month-with-kakr']
In [2]:
#efficientnet download
!pip install -U efficientnet==0.0.4
from efficientnet import EfficientNetB3
Collecting efficientnet==0.0.4
  Downloading https://files.pythonhosted.org/packages/a6/80/f2c098284f7c07491e66af18d9a5fea595d4b507d10c0845275b8d47dc6f/efficientnet-0.0.4.tar.gz
Building wheels for collected packages: efficientnet
  Building wheel for efficientnet (setup.py) ... - done
  Created wheel for efficientnet: filename=efficientnet-0.0.4-cp36-none-any.whl size=14289 sha256=3fe6c6f90f05f8f8cd0e747991e54841d15ef0180ef6f2dfd72d9942c10c6d72
  Stored in directory: /tmp/.cache/pip/wheels/5c/34/68/a611a699a28239e964ccf144c0e767cdb5439fee82ec5de6e0
Successfully built efficientnet
Installing collected packages: efficientnet
Successfully installed efficientnet-0.0.4

File Directory Setting

In [3]:
#crop data directory
DATA_PATH = '../input/car-crop'
os.listdir(DATA_PATH)
Out[3]:
['train_crop', 'test_crop']
In [4]:
#original data directory
DATA_PATH2 = '../input/2019-3rd-ml-month-with-kakr'
os.listdir(DATA_PATH2)
Out[4]:
['test.csv',
 'test',
 'train',
 'train.csv',
 'class.csv',
 'sample_submission.csv']
In [5]:
#semi_data directory
DATA_PATH3 = '../input/semi-detaset'
os.listdir(DATA_PATH3)
Out[5]:
['Pseudo Labelsing.csv']
In [6]:
#crop merge directory
DATA_PATH4 = '../input/car-crop2'
os.listdir(DATA_PATH4)
Out[6]:
['train2_crop']
In [7]:
# 이미지 폴더 경로
TRAIN_IMG_PATH = os.path.join(DATA_PATH, 'train')
TEST_IMG_PATH = os.path.join(DATA_PATH, 'test')

# CSV 파일 경로
df_train = pd.read_csv(os.path.join(DATA_PATH2, 'train.csv'))
df_test = pd.read_csv(os.path.join(DATA_PATH2, 'test.csv'))
df_class = pd.read_csv(os.path.join(DATA_PATH2, 'class.csv'))

# 버전 1의 submission load
df_semi = pd.read_csv(os.path.join(DATA_PATH3, 'Pseudo Labelsing.csv'))

#버전 1에서 test를 tset로 저장하여 변경해줌 
name = list(map(lambda x:  x.replace("tset", "test"),df_semi['img_file']))
df_semi['img_file']=name
df_semi['img_file'] = df_semi['img_file']+'.jpg'
df_semi.head(5)
Out[7]:
img_file class
0 test_00001.jpg 124
1 test_00002.jpg 98
2 test_00003.jpg 157
3 test_00004.jpg 94
4 test_00005.jpg 18

train/test data Split

In [8]:
df_train["class"] = df_train["class"].astype('str')
df_semi["class"] = df_semi["class"].astype('str')
df_train = df_train[['img_file', 'class']]
df_test = df_test[['img_file']]

# train과 semi 데이터 병합
df_train2 = pd.concat([df_train, df_semi],axis=0)


its = np.arange(df_train2.shape[0])
train_idx, val_idx = train_test_split(its, train_size = 0.8, random_state=42)

X_train = df_train2.iloc[train_idx, :]
X_val = df_train2.iloc[val_idx, :]

print(X_train.shape)
print(X_val.shape)
print(df_test.shape)
df_train2.head(5)
(11985, 2)
(2997, 2)
(6150, 1)
Out[8]:
img_file class
0 train_00001.jpg 108
1 train_00002.jpg 71
2 train_00003.jpg 76
3 train_00004.jpg 188
4 train_00005.jpg 44

Parameter

In [9]:
#ref: https://github.com/yu4u/cutout-random-erasing/blob/master/cifar10_resnet.py
def get_random_eraser(p=0.5, s_l=0.02, s_h=0.4, r_1=0.3, r_2=1/0.3, v_l=0, v_h=255, pixel_level=False):
    def eraser(input_img):
        img_h, img_w, img_c = input_img.shape
        p_1 = np.random.rand()

        if p_1 > p:
            return input_img

        while True:
            s = np.random.uniform(s_l, s_h) * img_h * img_w
            r = np.random.uniform(r_1, r_2)
            w = int(np.sqrt(s / r))
            h = int(np.sqrt(s * r))
            left = np.random.randint(0, img_w)
            top = np.random.randint(0, img_h)

            if left + w <= img_w and top + h <= img_h:
                break

        if pixel_level:
            c = np.random.uniform(v_l, v_h, (h, w, img_c))
        else:
            c = np.random.uniform(v_l, v_h)

        input_img[top:top + h, left:left + w, :] = c

        return input_img

    return eraser
In [10]:
import keras.backend as K
from keras.legacy import interfaces
from keras.optimizers import Optimizer


class AdamAccumulate(Optimizer):

    def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
                 epsilon=None, decay=0., amsgrad=False, accum_iters=1, **kwargs):
        if accum_iters < 1:
            raise ValueError('accum_iters must be >= 1')
        super(AdamAccumulate, self).__init__(**kwargs)
        with K.name_scope(self.__class__.__name__):
            self.iterations = K.variable(0, dtype='int64', name='iterations')
            self.lr = K.variable(lr, name='lr')
            self.beta_1 = K.variable(beta_1, name='beta_1')
            self.beta_2 = K.variable(beta_2, name='beta_2')
            self.decay = K.variable(decay, name='decay')
        if epsilon is None:
            epsilon = K.epsilon()
        self.epsilon = epsilon
        self.initial_decay = decay
        self.amsgrad = amsgrad
        self.accum_iters = K.variable(accum_iters, K.dtype(self.iterations))
        self.accum_iters_float = K.cast(self.accum_iters, K.floatx())

    @interfaces.legacy_get_updates_support
    def get_updates(self, loss, params):
        grads = self.get_gradients(loss, params)
        self.updates = [K.update_add(self.iterations, 1)]

        lr = self.lr

        completed_updates = K.cast(K.tf.floordiv(self.iterations, self.accum_iters), K.floatx())

        if self.initial_decay > 0:
            lr = lr * (1. / (1. + self.decay * completed_updates))

        t = completed_updates + 1

        lr_t = lr * (K.sqrt(1. - K.pow(self.beta_2, t)) / (1. - K.pow(self.beta_1, t)))

        # self.iterations incremented after processing a batch
        # batch:              1 2 3 4 5 6 7 8 9
        # self.iterations:    0 1 2 3 4 5 6 7 8
        # update_switch = 1:        x       x    (if accum_iters=4)  
        update_switch = K.equal((self.iterations + 1) % self.accum_iters, 0)
        update_switch = K.cast(update_switch, K.floatx())

        ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        gs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]

        if self.amsgrad:
            vhats = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
        else:
            vhats = [K.zeros(1) for _ in params]

        self.weights = [self.iterations] + ms + vs + vhats

        for p, g, m, v, vhat, tg in zip(params, grads, ms, vs, vhats, gs):

            sum_grad = tg + g
            avg_grad = sum_grad / self.accum_iters_float

            m_t = (self.beta_1 * m) + (1. - self.beta_1) * avg_grad
            v_t = (self.beta_2 * v) + (1. - self.beta_2) * K.square(avg_grad)

            if self.amsgrad:
                vhat_t = K.maximum(vhat, v_t)
                p_t = p - lr_t * m_t / (K.sqrt(vhat_t) + self.epsilon)
                self.updates.append(K.update(vhat, (1 - update_switch) * vhat + update_switch * vhat_t))
            else:
                p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon)

            self.updates.append(K.update(m, (1 - update_switch) * m + update_switch * m_t))
            self.updates.append(K.update(v, (1 - update_switch) * v + update_switch * v_t))
            self.updates.append(K.update(tg, (1 - update_switch) * sum_grad))
            new_p = p_t

            # Apply constraints.
            if getattr(p, 'constraint', None) is not None:
                new_p = p.constraint(new_p)

            self.updates.append(K.update(p, (1 - update_switch) * p + update_switch * new_p))
        return self.updates

    def get_config(self):
        config = {'lr': float(K.get_value(self.lr)),
                  'beta_1': float(K.get_value(self.beta_1)),
                  'beta_2': float(K.get_value(self.beta_2)),
                  'decay': float(K.get_value(self.decay)),
                  'epsilon': self.epsilon,
                  'amsgrad': self.amsgrad}
        base_config = super(AdamAccumulate, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))
In [11]:
# Parameter
img_size = (300, 300)
image_size = 300
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
nb_test_samples = len(df_test)
epochs = 30
batch_size = 32

# Define Generator config
train_datagen =ImageDataGenerator(
    rescale=1./255,
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    vertical_flip=False,
    fill_mode='nearest',
    preprocessing_function = get_random_eraser(v_l=0, v_h=1),
    )

val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
In [12]:
#generator
train_generator = train_datagen.flow_from_dataframe(
    dataframe=X_train, 
    directory='../input/car-crop2/train2_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size,
    seed=42
)

validation_generator = val_datagen.flow_from_dataframe(
    dataframe=X_val, 
    directory='../input/car-crop2/train2_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size
)

test_generator = test_datagen.flow_from_dataframe(
    dataframe=df_test,
    directory='../input/car-crop/test_crop',
    x_col='img_file',
    y_col=None,
    target_size= img_size,
    color_mode='rgb',
    class_mode=None,
    batch_size=batch_size,
    shuffle=False
)
Found 11985 validated image filenames belonging to 196 classes.
Found 2997 validated image filenames belonging to 196 classes.
Found 6150 validated image filenames.

Model

In [13]:
#model
opt = AdamAccumulate(lr=0.001, decay=1e-5, accum_iters=5)
EfficientNet_model = EfficientNetB3(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))


model = Sequential()
model.add(EfficientNet_model)
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(2048, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(196, activation='softmax'))
model.summary()

#compile
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['acc'])
Downloading data from https://github.com/qubvel/efficientnet/releases/download/v0.0.1/efficientnet-b3_imagenet_1000_notop.h5
43974656/43966704 [==============================] - 1s 0us/step
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
efficientnet-b3 (Model)      (None, 10, 10, 1536)      10783528  
_________________________________________________________________
global_average_pooling2d_1 ( (None, 1536)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2048)              3147776   
_________________________________________________________________
dropout_1 (Dropout)          (None, 2048)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 196)               401604    
=================================================================
Total params: 14,332,908
Trainable params: 14,245,612
Non-trainable params: 87,296
_________________________________________________________________
In [14]:
def get_steps(num_samples, batch_size):
    if (num_samples % batch_size) > 0 :
        return (num_samples // batch_size) + 1
    else :
        return num_samples // batch_size
In [15]:
%%time
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

#model path
MODEL_SAVE_FOLDER_PATH = './model/'
if not os.path.exists(MODEL_SAVE_FOLDER_PATH):
    os.mkdir(MODEL_SAVE_FOLDER_PATH)

model_path = MODEL_SAVE_FOLDER_PATH + '{epoch:02d}-{val_loss:.4f}.hdf5'

patient = 3
callbacks_list = [
     EarlyStopping(
        # 모델의 검증 정확도 모니터링
        monitor='val_loss',
        # patient(정수)보다 정확도가 향상되지 않으면 훈련 종료
        patience=patient, 
        # 검증에 대해 판단하기 위한 기준, val_loss경우 감소되는 것이므로 min
        mode='min', 
        #얼마나 자세하게 정보를 나타낼것인가.
        verbose=1
                          
    ),
    ReduceLROnPlateau(
        monitor = 'val_loss', 
        #콜백 호출시 학습률(lr)을 절반으로 줄임
        factor = 0.5, 
        #위와 동일
        patience = patient / 2, 
        #최소학습률
        min_lr=0.00001,
        verbose=1,
        mode='min'
    ),
    ModelCheckpoint(
        filepath=model_path,
        monitor ='val_loss',
        # val_loss가 좋지 않으면 모델파일을 덮어쓰지 않는다
        save_best_only = True,
        verbose=1,
        mode='min') ]

    

history = model.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
gc.collect()
Epoch 1/30
375/375 [==============================] - 457s 1s/step - loss: 3.7669 - acc: 0.1977 - val_loss: 1.3730 - val_acc: 0.6123

Epoch 00001: val_loss improved from inf to 1.37298, saving model to ./model/01-1.3730.hdf5
Epoch 2/30
375/375 [==============================] - 383s 1s/step - loss: 1.1834 - acc: 0.6701 - val_loss: 0.5710 - val_acc: 0.8282

Epoch 00002: val_loss improved from 1.37298 to 0.57099, saving model to ./model/02-0.5710.hdf5
Epoch 3/30
375/375 [==============================] - 384s 1s/step - loss: 0.6122 - acc: 0.8226 - val_loss: 0.3832 - val_acc: 0.8896

Epoch 00003: val_loss improved from 0.57099 to 0.38318, saving model to ./model/03-0.3832.hdf5
Epoch 4/30
375/375 [==============================] - 383s 1s/step - loss: 0.4298 - acc: 0.8751 - val_loss: 0.3178 - val_acc: 0.9072

Epoch 00004: val_loss improved from 0.38318 to 0.31778, saving model to ./model/04-0.3178.hdf5
Epoch 5/30
375/375 [==============================] - 384s 1s/step - loss: 0.3330 - acc: 0.8999 - val_loss: 0.3468 - val_acc: 0.9029

Epoch 00005: val_loss did not improve from 0.31778
Epoch 6/30
375/375 [==============================] - 382s 1s/step - loss: 0.2818 - acc: 0.9150 - val_loss: 0.3284 - val_acc: 0.9156

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.

Epoch 00006: val_loss did not improve from 0.31778
Epoch 7/30
375/375 [==============================] - 386s 1s/step - loss: 0.1712 - acc: 0.9480 - val_loss: 0.2203 - val_acc: 0.9466

Epoch 00007: val_loss improved from 0.31778 to 0.22033, saving model to ./model/07-0.2203.hdf5
Epoch 8/30
375/375 [==============================] - 386s 1s/step - loss: 0.1270 - acc: 0.9621 - val_loss: 0.2148 - val_acc: 0.9499

Epoch 00008: val_loss improved from 0.22033 to 0.21477, saving model to ./model/08-0.2148.hdf5
Epoch 9/30
375/375 [==============================] - 387s 1s/step - loss: 0.1175 - acc: 0.9638 - val_loss: 0.2108 - val_acc: 0.9533

Epoch 00009: val_loss improved from 0.21477 to 0.21076, saving model to ./model/09-0.2108.hdf5
Epoch 10/30
375/375 [==============================] - 386s 1s/step - loss: 0.0958 - acc: 0.9706 - val_loss: 0.2121 - val_acc: 0.9503

Epoch 00010: val_loss did not improve from 0.21076
Epoch 11/30
375/375 [==============================] - 387s 1s/step - loss: 0.0943 - acc: 0.9718 - val_loss: 0.2321 - val_acc: 0.9469

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.

Epoch 00011: val_loss did not improve from 0.21076
Epoch 12/30
375/375 [==============================] - 386s 1s/step - loss: 0.0784 - acc: 0.9777 - val_loss: 0.2116 - val_acc: 0.9536

Epoch 00012: val_loss did not improve from 0.21076
Epoch 00012: early stopping
CPU times: user 1h 41min 55s, sys: 38min 26s, total: 2h 20min 22s
Wall time: 1h 18min 47s
Out[15]:
60

acc / loss Plot

In [16]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, label='Training acc')
plt.plot(epochs, val_acc, label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.ylim(0.9,1)
plt.show()
In [17]:
plt.plot(epochs, loss, label='Training loss')
plt.plot(epochs, val_loss, label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.ylim(0,0.5)
plt.show()

Predict

In [18]:
%%time
test_generator.reset()
prediction = model.predict_generator(
    generator = test_generator,
    steps = get_steps(nb_test_samples, batch_size),
    verbose=1
)
193/193 [==============================] - 64s 329ms/step
CPU times: user 50.7 s, sys: 31.1 s, total: 1min 21s
Wall time: 1min 3s

Submission

In [19]:
submission = pd.read_csv(os.path.join(DATA_PATH2, 'sample_submission.csv'))
predicted_class_indices=np.argmax(prediction, axis=1)

# Generator class dictionary mapping
labels = (train_generator.class_indices)
labels = dict((v,k) for k,v in labels.items())
predictions = [labels[k] for k in predicted_class_indices]

submission["class"] = predictions
submission.to_csv("submission_all.csv", index=False)
submission.head()
Out[19]:
img_file class
0 test_00001.jpg 124
1 test_00002.jpg 98
2 test_00003.jpg 157
3 test_00004.jpg 94
4 test_00005.jpg 18

Result

  • Public 0.930 에서 Public 0.941로 0.011정도 올랐습니다!!.

image.png

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형

'competition' 카테고리의 다른 글

DACON - 천체 유형 분류대회 후기(3th)  (0) 2020.05.02
14th solution - 9%  (0) 2019.11.07
Compare optimizer of efficientNet  (2) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
kaggle Top8% (681th of 8802) 🥉  (0) 2019.10.17
반응형
compare-optimizer-of-efficientnet

3rd ML Month - Compare optimizer of efficientNet

Reference

Package

In [1]:
import gc
import os
import warnings
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import f1_score
from keras import backend as K
# for문 시간계산 lib
from tqdm import tqdm_notebook
# 교차검증 lib
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
#모델 lib
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, Model
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, GlobalAveragePooling2D
from keras import layers
from keras.optimizers import Adam,RMSprop,SGD,Nadam
#경고메세지 무시
import warnings
warnings.filterwarnings(action='ignore')
#input 하위 디렉토리 폴터
import os
print(os.listdir("../input"))
Using TensorFlow backend.
['car-crop', '2019-3rd-ml-month-with-kakr']
In [2]:
#efficientnet download
!pip install git+https://github.com/qubvel/efficientnet
from efficientnet import EfficientNetB3
Collecting git+https://github.com/qubvel/efficientnet
  Cloning https://github.com/qubvel/efficientnet to /tmp/pip-req-build-4pdycl7v
  Running command git clone -q https://github.com/qubvel/efficientnet /tmp/pip-req-build-4pdycl7v
Building wheels for collected packages: efficientnet
  Building wheel for efficientnet (setup.py) ... - \ done
  Stored in directory: /tmp/pip-ephem-wheel-cache-npug02wz/wheels/64/60/2e/30ebaa76ed1626e86bfb0cc0579b737fdb7d9ff8cb9522663a
Successfully built efficientnet
Installing collected packages: efficientnet
Successfully installed efficientnet-0.0.4

File Directory Setting

In [3]:
#crop data directory
DATA_PATH = '../input/car-crop'
os.listdir(DATA_PATH)
Out[3]:
['train_crop', 'test_crop']
In [4]:
#original data directory
DATA_PATH2 = '../input/2019-3rd-ml-month-with-kakr'
os.listdir(DATA_PATH2)
Out[4]:
['test.csv',
 'test',
 'train',
 'train.csv',
 'class.csv',
 'sample_submission.csv']
In [5]:
# 이미지 폴더 경로
TRAIN_IMG_PATH = os.path.join(DATA_PATH, 'train')
TEST_IMG_PATH = os.path.join(DATA_PATH, 'test')

# CSV 파일 경로
df_train = pd.read_csv(os.path.join(DATA_PATH2, 'train.csv'))
df_test = pd.read_csv(os.path.join(DATA_PATH2, 'test.csv'))
df_class = pd.read_csv(os.path.join(DATA_PATH2, 'class.csv'))

train/test data Split

In [6]:
df_train["class"] = df_train["class"].astype('str')

df_train = df_train[['img_file', 'class']]
df_test = df_test[['img_file']]

its = np.arange(df_train.shape[0])
train_idx, val_idx = train_test_split(its, train_size = 0.8, random_state=42)

X_train = df_train.iloc[train_idx, :]
X_val = df_train.iloc[val_idx, :]

print(X_train.shape)
print(X_val.shape)
print(df_test.shape)
(7992, 2)
(1998, 2)
(6150, 1)

Parameter

In [7]:
def recall_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

def precision_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))
In [8]:
# Parameter
img_size = (299, 299)
image_size = 299
nb_train_samples = len(X_train)
nb_validation_samples = len(X_val)
nb_test_samples = len(df_test)
epochs = 20
batch_size = 32

# Define Generator config
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    vertical_flip=False,
    zoom_range=0.2,
    fill_mode='nearest')
val_datagen = ImageDataGenerator(rescale=1./255)
In [9]:
#generator
train_generator = train_datagen.flow_from_dataframe(
    dataframe=X_train, 
    directory='../input/car-crop/train_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size,
    seed=42
)

validation_generator = val_datagen.flow_from_dataframe(
    dataframe=X_val, 
    directory='../input/car-crop/train_crop',
    x_col = 'img_file',
    y_col = 'class',
    target_size = img_size,
    color_mode='rgb',
    class_mode='categorical',
    batch_size=batch_size,
    shuffle=False,
    seed=42
)
Found 7992 validated image filenames belonging to 196 classes.
Found 1998 validated image filenames belonging to 196 classes.

Model

In [10]:
def get_steps(num_samples, batch_size):
    if (num_samples % batch_size) > 0 :
        return (num_samples // batch_size) + 1
    else :
        return num_samples // batch_size
In [11]:
%%time
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

#model path
MODEL_SAVE_FOLDER_PATH = './model/'
if not os.path.exists(MODEL_SAVE_FOLDER_PATH):
    os.mkdir(MODEL_SAVE_FOLDER_PATH)

model_path = MODEL_SAVE_FOLDER_PATH + '{epoch:02d}-{val_loss:.4f}.hdf5'

patient = 2
callbacks_list = [
    EarlyStopping(
        # 모델의 검증 정확도 모니터링
        monitor='val_loss', 
        # patient(정수)보다 정확도가 향상되지 않으면 훈련 종료
        patience=patient, 
        # 검증에 대해 판단하기 위한 기준, val_loss경우 감소되는 것이므로 min
        mode='min', 
        #얼마나 자세하게 정보를 나타낼것인가.
        verbose=1
                          
    ),
    ReduceLROnPlateau(
        monitor = 'val_loss', 
        #콜백 호출시 학습률(lr)을 절반으로 줄임
        factor = 0.5, 
        #위와 동일
        patience = patient / 2, 
        #최소학습률
        min_lr=0.00001,
        verbose=1,
        mode='min'
    ) ]
gc.collect()
CPU times: user 116 ms, sys: 4 ms, total: 120 ms
Wall time: 116 ms
Out[11]:
381
In [12]:
#model
def get_model():
    EfficientNet_model = base_model = EfficientNetB3(weights='imagenet', include_top=False, 
                                                     input_shape=(299, 299, 3))


    model = Sequential()
    model.add(EfficientNet_model)
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dense(2048, activation='relu'))
    model.add(layers.Dropout(0.25))
    model.add(layers.Dense(196, activation='softmax'))
    #model.summary()
    
    return model

Optimizer 1: RMSprop

In [13]:
#compile
model_rmsprop = get_model()
model_rmsprop.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc',f1_m])
hist_rmsprop = model_rmsprop.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Downloading data from https://github.com/qubvel/efficientnet/releases/download/v0.0.1/efficientnet-b3_imagenet_1000_notop.h5
43974656/43966704 [==============================] - 1s 0us/step
Epoch 1/20
250/250 [==============================] - 290s 1s/step - loss: 3.5051 - acc: 0.2196 - f1_m: 0.1509 - val_loss: 1.9829 - val_acc: 0.4630 - val_f1_m: 0.4226
Epoch 2/20
250/250 [==============================] - 248s 993ms/step - loss: 1.3776 - acc: 0.6072 - f1_m: 0.5945 - val_loss: 1.4108 - val_acc: 0.6371 - val_f1_m: 0.6503
Epoch 3/20
250/250 [==============================] - 249s 995ms/step - loss: 0.9035 - acc: 0.7302 - f1_m: 0.7368 - val_loss: 0.9871 - val_acc: 0.7472 - val_f1_m: 0.7539
Epoch 4/20
250/250 [==============================] - 248s 993ms/step - loss: 0.6861 - acc: 0.7971 - f1_m: 0.8006 - val_loss: 0.9284 - val_acc: 0.7658 - val_f1_m: 0.7778
Epoch 5/20
250/250 [==============================] - 250s 1s/step - loss: 0.5730 - acc: 0.8353 - f1_m: 0.8366 - val_loss: 0.7969 - val_acc: 0.8163 - val_f1_m: 0.8232
Epoch 6/20
250/250 [==============================] - 253s 1s/step - loss: 0.4692 - acc: 0.8576 - f1_m: 0.8580 - val_loss: 0.8536 - val_acc: 0.8198 - val_f1_m: 0.8241

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/20
250/250 [==============================] - 248s 991ms/step - loss: 0.2281 - acc: 0.9255 - f1_m: 0.9264 - val_loss: 0.5314 - val_acc: 0.8789 - val_f1_m: 0.8827
Epoch 8/20
250/250 [==============================] - 246s 985ms/step - loss: 0.1798 - acc: 0.9421 - f1_m: 0.9430 - val_loss: 0.5439 - val_acc: 0.8839 - val_f1_m: 0.8874

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 9/20
250/250 [==============================] - 246s 982ms/step - loss: 0.1038 - acc: 0.9646 - f1_m: 0.9656 - val_loss: 0.4406 - val_acc: 0.9079 - val_f1_m: 0.9102
Epoch 10/20
250/250 [==============================] - 245s 978ms/step - loss: 0.0694 - acc: 0.9764 - f1_m: 0.9770 - val_loss: 0.4560 - val_acc: 0.9049 - val_f1_m: 0.9065

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 11/20
250/250 [==============================] - 247s 990ms/step - loss: 0.0472 - acc: 0.9844 - f1_m: 0.9846 - val_loss: 0.4593 - val_acc: 0.9079 - val_f1_m: 0.9106

Epoch 00011: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 00011: early stopping

Optimizer 2: Adam

In [14]:
#compile
model_adam = get_model()
model_adam.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=['acc',f1_m])
hist_adam = model_adam.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 281s 1s/step - loss: 3.7481 - acc: 0.1823 - f1_m: 0.1136 - val_loss: 2.7741 - val_acc: 0.3639 - val_f1_m: 0.3564
Epoch 2/20
250/250 [==============================] - 245s 981ms/step - loss: 1.6301 - acc: 0.5540 - f1_m: 0.5289 - val_loss: 1.2303 - val_acc: 0.6612 - val_f1_m: 0.6570
Epoch 3/20
250/250 [==============================] - 245s 980ms/step - loss: 0.9075 - acc: 0.7375 - f1_m: 0.7402 - val_loss: 0.8972 - val_acc: 0.7487 - val_f1_m: 0.7541
Epoch 4/20
250/250 [==============================] - 244s 977ms/step - loss: 0.6960 - acc: 0.7932 - f1_m: 0.7889 - val_loss: 0.8714 - val_acc: 0.7733 - val_f1_m: 0.7783
Epoch 5/20
250/250 [==============================] - 245s 980ms/step - loss: 0.5387 - acc: 0.8333 - f1_m: 0.8360 - val_loss: 0.7838 - val_acc: 0.7943 - val_f1_m: 0.7996
Epoch 6/20
250/250 [==============================] - 244s 978ms/step - loss: 0.4708 - acc: 0.8572 - f1_m: 0.8600 - val_loss: 0.7524 - val_acc: 0.8248 - val_f1_m: 0.8348
Epoch 7/20
250/250 [==============================] - 245s 979ms/step - loss: 0.3988 - acc: 0.8737 - f1_m: 0.8768 - val_loss: 0.7650 - val_acc: 0.8058 - val_f1_m: 0.8189

Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 8/20
250/250 [==============================] - 244s 975ms/step - loss: 0.1856 - acc: 0.9407 - f1_m: 0.9418 - val_loss: 0.4081 - val_acc: 0.8959 - val_f1_m: 0.8962
Epoch 9/20
250/250 [==============================] - 244s 976ms/step - loss: 0.1304 - acc: 0.9559 - f1_m: 0.9562 - val_loss: 0.4419 - val_acc: 0.8944 - val_f1_m: 0.8960

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 10/20
250/250 [==============================] - 246s 982ms/step - loss: 0.0780 - acc: 0.9755 - f1_m: 0.9751 - val_loss: 0.4020 - val_acc: 0.9104 - val_f1_m: 0.9131
Epoch 11/20
250/250 [==============================] - 245s 980ms/step - loss: 0.0489 - acc: 0.9842 - f1_m: 0.9840 - val_loss: 0.3835 - val_acc: 0.9134 - val_f1_m: 0.9190
Epoch 12/20
250/250 [==============================] - 244s 978ms/step - loss: 0.0464 - acc: 0.9843 - f1_m: 0.9842 - val_loss: 0.3998 - val_acc: 0.9129 - val_f1_m: 0.9157

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 13/20
250/250 [==============================] - 246s 986ms/step - loss: 0.0377 - acc: 0.9872 - f1_m: 0.9872 - val_loss: 0.3781 - val_acc: 0.9199 - val_f1_m: 0.9232
Epoch 14/20
250/250 [==============================] - 251s 1s/step - loss: 0.0297 - acc: 0.9906 - f1_m: 0.9910 - val_loss: 0.3760 - val_acc: 0.9219 - val_f1_m: 0.9257
Epoch 15/20
250/250 [==============================] - 251s 1s/step - loss: 0.0297 - acc: 0.9903 - f1_m: 0.9904 - val_loss: 0.3899 - val_acc: 0.9219 - val_f1_m: 0.9240

Epoch 00015: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 16/20
250/250 [==============================] - 252s 1s/step - loss: 0.0249 - acc: 0.9915 - f1_m: 0.9917 - val_loss: 0.3785 - val_acc: 0.9259 - val_f1_m: 0.9273

Epoch 00016: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 00016: early stopping

Optimizer 3: Nadam

In [15]:
#compile
model_nadam = get_model()
model_nadam.compile(loss='categorical_crossentropy', optimizer=Nadam(), metrics=['acc',f1_m])
hist_nadam = model_nadam.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 288s 1s/step - loss: 4.1294 - acc: 0.1234 - f1_m: 0.0572 - val_loss: 4.2228 - val_acc: 0.1732 - val_f1_m: 0.1424
Epoch 2/20
250/250 [==============================] - 242s 967ms/step - loss: 2.2012 - acc: 0.4195 - f1_m: 0.3679 - val_loss: 2.6808 - val_acc: 0.4960 - val_f1_m: 0.4767
Epoch 3/20
250/250 [==============================] - 242s 969ms/step - loss: 1.4434 - acc: 0.5975 - f1_m: 0.5878 - val_loss: 1.2144 - val_acc: 0.6787 - val_f1_m: 0.6876
Epoch 4/20
250/250 [==============================] - 242s 970ms/step - loss: 1.0976 - acc: 0.6841 - f1_m: 0.6825 - val_loss: 1.1604 - val_acc: 0.6827 - val_f1_m: 0.6945
Epoch 5/20
250/250 [==============================] - 241s 963ms/step - loss: 0.8958 - acc: 0.7365 - f1_m: 0.7412 - val_loss: 0.9532 - val_acc: 0.7412 - val_f1_m: 0.7506
Epoch 6/20
250/250 [==============================] - 241s 963ms/step - loss: 0.7421 - acc: 0.7752 - f1_m: 0.7774 - val_loss: 0.8828 - val_acc: 0.7548 - val_f1_m: 0.7629
Epoch 7/20
250/250 [==============================] - 242s 966ms/step - loss: 0.6870 - acc: 0.7912 - f1_m: 0.7934 - val_loss: 0.8472 - val_acc: 0.7723 - val_f1_m: 0.7799
Epoch 8/20
250/250 [==============================] - 241s 964ms/step - loss: 0.6115 - acc: 0.8132 - f1_m: 0.8144 - val_loss: 0.8297 - val_acc: 0.7783 - val_f1_m: 0.7875
Epoch 9/20
250/250 [==============================] - 240s 961ms/step - loss: 0.5497 - acc: 0.8315 - f1_m: 0.8336 - val_loss: 0.7878 - val_acc: 0.8023 - val_f1_m: 0.8098
Epoch 10/20
250/250 [==============================] - 241s 962ms/step - loss: 0.5024 - acc: 0.8471 - f1_m: 0.8473 - val_loss: 0.8300 - val_acc: 0.7803 - val_f1_m: 0.7876

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0010000000474974513.
Epoch 11/20
250/250 [==============================] - 240s 960ms/step - loss: 0.2669 - acc: 0.9133 - f1_m: 0.9142 - val_loss: 0.4663 - val_acc: 0.8869 - val_f1_m: 0.8897
Epoch 12/20
250/250 [==============================] - 241s 964ms/step - loss: 0.1816 - acc: 0.9398 - f1_m: 0.9399 - val_loss: 0.4990 - val_acc: 0.8709 - val_f1_m: 0.8717

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 13/20
250/250 [==============================] - 241s 965ms/step - loss: 0.1196 - acc: 0.9607 - f1_m: 0.9615 - val_loss: 0.4290 - val_acc: 0.8919 - val_f1_m: 0.8933
Epoch 14/20
250/250 [==============================] - 242s 967ms/step - loss: 0.0917 - acc: 0.9695 - f1_m: 0.9706 - val_loss: 0.4254 - val_acc: 0.8999 - val_f1_m: 0.9043
Epoch 15/20
250/250 [==============================] - 240s 960ms/step - loss: 0.0724 - acc: 0.9757 - f1_m: 0.9762 - val_loss: 0.4568 - val_acc: 0.9009 - val_f1_m: 0.9025

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 16/20
250/250 [==============================] - 240s 960ms/step - loss: 0.0701 - acc: 0.9778 - f1_m: 0.9771 - val_loss: 0.4242 - val_acc: 0.9124 - val_f1_m: 0.9149
Epoch 17/20
250/250 [==============================] - 239s 957ms/step - loss: 0.0569 - acc: 0.9799 - f1_m: 0.9799 - val_loss: 0.4275 - val_acc: 0.9054 - val_f1_m: 0.9080

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 18/20
250/250 [==============================] - 239s 957ms/step - loss: 0.0563 - acc: 0.9806 - f1_m: 0.9807 - val_loss: 0.4138 - val_acc: 0.9129 - val_f1_m: 0.9157
Epoch 19/20
250/250 [==============================] - 240s 960ms/step - loss: 0.0394 - acc: 0.9864 - f1_m: 0.9864 - val_loss: 0.4088 - val_acc: 0.9119 - val_f1_m: 0.9135
Epoch 20/20
250/250 [==============================] - 239s 956ms/step - loss: 0.0407 - acc: 0.9867 - f1_m: 0.9867 - val_loss: 0.4164 - val_acc: 0.9134 - val_f1_m: 0.9143

Epoch 00020: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.

Optimizer 4: SGD

In [16]:
#compile
model_sgd = get_model()
model_sgd.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['acc',f1_m])
hist_sgd = model_sgd.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 272s 1s/step - loss: 5.2815 - acc: 0.0070 - f1_m: 0.0000e+00 - val_loss: 5.2558 - val_acc: 0.0100 - val_f1_m: 0.0000e+00
Epoch 2/20
250/250 [==============================] - 237s 947ms/step - loss: 5.2364 - acc: 0.0134 - f1_m: 0.0000e+00 - val_loss: 5.2147 - val_acc: 0.0195 - val_f1_m: 0.0000e+00
Epoch 3/20
250/250 [==============================] - 236s 945ms/step - loss: 5.1866 - acc: 0.0198 - f1_m: 0.0000e+00 - val_loss: 5.1626 - val_acc: 0.0425 - val_f1_m: 0.0000e+00
Epoch 4/20
250/250 [==============================] - 237s 948ms/step - loss: 5.1253 - acc: 0.0367 - f1_m: 0.0000e+00 - val_loss: 5.0876 - val_acc: 0.0571 - val_f1_m: 0.0000e+00
Epoch 5/20
250/250 [==============================] - 236s 945ms/step - loss: 5.0311 - acc: 0.0557 - f1_m: 0.0000e+00 - val_loss: 4.9584 - val_acc: 0.0771 - val_f1_m: 0.0000e+00
Epoch 6/20
250/250 [==============================] - 236s 945ms/step - loss: 4.8801 - acc: 0.0801 - f1_m: 0.0000e+00 - val_loss: 4.7433 - val_acc: 0.0946 - val_f1_m: 0.0000e+00
Epoch 7/20
250/250 [==============================] - 236s 944ms/step - loss: 4.6674 - acc: 0.1102 - f1_m: 0.0000e+00 - val_loss: 4.5005 - val_acc: 0.1296 - val_f1_m: 0.0000e+00
Epoch 8/20
250/250 [==============================] - 237s 949ms/step - loss: 4.4327 - acc: 0.1355 - f1_m: 7.2728e-04 - val_loss: 4.2099 - val_acc: 0.1702 - val_f1_m: 0.0000e+00
Epoch 9/20
250/250 [==============================] - 238s 952ms/step - loss: 4.1743 - acc: 0.1830 - f1_m: 4.8485e-04 - val_loss: 3.9078 - val_acc: 0.2157 - val_f1_m: 0.0067
Epoch 10/20
250/250 [==============================] - 237s 948ms/step - loss: 3.8609 - acc: 0.2226 - f1_m: 0.0057 - val_loss: 3.5655 - val_acc: 0.2688 - val_f1_m: 0.0125
Epoch 11/20
250/250 [==============================] - 265s 1s/step - loss: 3.5310 - acc: 0.2820 - f1_m: 0.0152 - val_loss: 3.2231 - val_acc: 0.3268 - val_f1_m: 0.0258
Epoch 12/20
250/250 [==============================] - 269s 1s/step - loss: 3.2108 - acc: 0.3311 - f1_m: 0.0350 - val_loss: 2.8546 - val_acc: 0.4124 - val_f1_m: 0.0495
Epoch 13/20
250/250 [==============================] - 265s 1s/step - loss: 2.8602 - acc: 0.3960 - f1_m: 0.0636 - val_loss: 2.5056 - val_acc: 0.4700 - val_f1_m: 0.1007
Epoch 14/20
250/250 [==============================] - 246s 984ms/step - loss: 2.5567 - acc: 0.4545 - f1_m: 0.1146 - val_loss: 2.2381 - val_acc: 0.5215 - val_f1_m: 0.1474
Epoch 15/20
250/250 [==============================] - 303s 1s/step - loss: 2.2699 - acc: 0.4998 - f1_m: 0.1789 - val_loss: 1.9367 - val_acc: 0.5701 - val_f1_m: 0.2627
Epoch 16/20
250/250 [==============================] - 245s 981ms/step - loss: 2.0120 - acc: 0.5417 - f1_m: 0.2576 - val_loss: 1.7187 - val_acc: 0.6081 - val_f1_m: 0.3514
Epoch 17/20
250/250 [==============================] - 244s 976ms/step - loss: 1.7865 - acc: 0.5907 - f1_m: 0.3439 - val_loss: 1.5097 - val_acc: 0.6411 - val_f1_m: 0.4685
Epoch 18/20
250/250 [==============================] - 241s 965ms/step - loss: 1.6084 - acc: 0.6243 - f1_m: 0.4144 - val_loss: 1.3484 - val_acc: 0.6792 - val_f1_m: 0.5331
Epoch 19/20
250/250 [==============================] - 245s 978ms/step - loss: 1.4419 - acc: 0.6577 - f1_m: 0.5014 - val_loss: 1.2364 - val_acc: 0.6937 - val_f1_m: 0.5887
Epoch 20/20
250/250 [==============================] - 244s 976ms/step - loss: 1.3128 - acc: 0.6885 - f1_m: 0.5429 - val_loss: 1.0948 - val_acc: 0.7192 - val_f1_m: 0.6461

Optimizer 5: SGD + Nesterov

In [17]:
#compile
model_sgdnes = get_model()
model_sgdnes.compile(loss='categorical_crossentropy', optimizer=SGD(nesterov=True), metrics=['acc',f1_m])
hist_sgdnes = model_sgdnes.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 287s 1s/step - loss: 5.2796 - acc: 0.0066 - f1_m: 0.0000e+00 - val_loss: 5.2519 - val_acc: 0.0060 - val_f1_m: 0.0000e+00
Epoch 2/20
250/250 [==============================] - 257s 1s/step - loss: 5.2392 - acc: 0.0112 - f1_m: 0.0000e+00 - val_loss: 5.2110 - val_acc: 0.0195 - val_f1_m: 0.0000e+00
Epoch 3/20
250/250 [==============================] - 253s 1s/step - loss: 5.1897 - acc: 0.0218 - f1_m: 0.0000e+00 - val_loss: 5.1554 - val_acc: 0.0385 - val_f1_m: 0.0000e+00
Epoch 4/20
250/250 [==============================] - 250s 1s/step - loss: 5.1149 - acc: 0.0385 - f1_m: 0.0000e+00 - val_loss: 5.0658 - val_acc: 0.0591 - val_f1_m: 0.0000e+00
Epoch 5/20
250/250 [==============================] - 246s 983ms/step - loss: 5.0154 - acc: 0.0540 - f1_m: 0.0000e+00 - val_loss: 4.9082 - val_acc: 0.0711 - val_f1_m: 0.0000e+00
Epoch 6/20
250/250 [==============================] - 245s 978ms/step - loss: 4.8579 - acc: 0.0828 - f1_m: 0.0000e+00 - val_loss: 4.7131 - val_acc: 0.1206 - val_f1_m: 0.0000e+00
Epoch 7/20
250/250 [==============================] - 246s 984ms/step - loss: 4.6758 - acc: 0.1168 - f1_m: 0.0000e+00 - val_loss: 4.4955 - val_acc: 0.1491 - val_f1_m: 9.7067e-04
Epoch 8/20
250/250 [==============================] - 248s 991ms/step - loss: 4.4442 - acc: 0.1498 - f1_m: 7.2728e-04 - val_loss: 4.2417 - val_acc: 0.1807 - val_f1_m: 0.0019
Epoch 9/20
250/250 [==============================] - 244s 977ms/step - loss: 4.1850 - acc: 0.1889 - f1_m: 0.0032 - val_loss: 3.9157 - val_acc: 0.2162 - val_f1_m: 0.0087
Epoch 10/20
250/250 [==============================] - 244s 978ms/step - loss: 3.8912 - acc: 0.2210 - f1_m: 0.0082 - val_loss: 3.5765 - val_acc: 0.2613 - val_f1_m: 0.0222
Epoch 11/20
250/250 [==============================] - 245s 979ms/step - loss: 3.5922 - acc: 0.2692 - f1_m: 0.0211 - val_loss: 3.2315 - val_acc: 0.3473 - val_f1_m: 0.0398
Epoch 12/20
250/250 [==============================] - 244s 975ms/step - loss: 3.2267 - acc: 0.3395 - f1_m: 0.0428 - val_loss: 2.8834 - val_acc: 0.3809 - val_f1_m: 0.0714
Epoch 13/20
250/250 [==============================] - 241s 962ms/step - loss: 2.9179 - acc: 0.3876 - f1_m: 0.0712 - val_loss: 2.5499 - val_acc: 0.4665 - val_f1_m: 0.1178
Epoch 14/20
250/250 [==============================] - 240s 962ms/step - loss: 2.6022 - acc: 0.4465 - f1_m: 0.1123 - val_loss: 2.2473 - val_acc: 0.4995 - val_f1_m: 0.1788
Epoch 15/20
250/250 [==============================] - 242s 967ms/step - loss: 2.2877 - acc: 0.5069 - f1_m: 0.1890 - val_loss: 1.9782 - val_acc: 0.5651 - val_f1_m: 0.2664
Epoch 16/20
250/250 [==============================] - 239s 956ms/step - loss: 2.0582 - acc: 0.5426 - f1_m: 0.2587 - val_loss: 1.7474 - val_acc: 0.6166 - val_f1_m: 0.3598
Epoch 17/20
250/250 [==============================] - 238s 953ms/step - loss: 1.8387 - acc: 0.5868 - f1_m: 0.3370 - val_loss: 1.5692 - val_acc: 0.6261 - val_f1_m: 0.4423
Epoch 18/20
250/250 [==============================] - 238s 953ms/step - loss: 1.6553 - acc: 0.6237 - f1_m: 0.4035 - val_loss: 1.3760 - val_acc: 0.6867 - val_f1_m: 0.5301
Epoch 19/20
250/250 [==============================] - 243s 973ms/step - loss: 1.4744 - acc: 0.6533 - f1_m: 0.4755 - val_loss: 1.2595 - val_acc: 0.6947 - val_f1_m: 0.5777
Epoch 20/20
250/250 [==============================] - 238s 950ms/step - loss: 1.3349 - acc: 0.6892 - f1_m: 0.5419 - val_loss: 1.1540 - val_acc: 0.7132 - val_f1_m: 0.6220

Optimizer 6: SGD with momentum=0.9

In [18]:
#compile
model_sgdmo = get_model()
model_sgdmo.compile(loss='categorical_crossentropy', optimizer=SGD(momentum=0.9), metrics=['acc',f1_m])
hist_sgdmo = model_sgdmo.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 283s 1s/step - loss: 5.0416 - acc: 0.0401 - f1_m: 2.4243e-04 - val_loss: 4.3300 - val_acc: 0.1086 - val_f1_m: 0.0048
Epoch 2/20
250/250 [==============================] - 239s 956ms/step - loss: 3.3459 - acc: 0.2551 - f1_m: 0.0784 - val_loss: 2.0053 - val_acc: 0.4870 - val_f1_m: 0.3179
Epoch 3/20
250/250 [==============================] - 239s 956ms/step - loss: 1.6440 - acc: 0.5638 - f1_m: 0.4653 - val_loss: 1.0241 - val_acc: 0.7127 - val_f1_m: 0.6995
Epoch 4/20
250/250 [==============================] - 240s 959ms/step - loss: 0.9140 - acc: 0.7432 - f1_m: 0.7190 - val_loss: 0.7338 - val_acc: 0.7863 - val_f1_m: 0.7882
Epoch 5/20
250/250 [==============================] - 240s 960ms/step - loss: 0.5982 - acc: 0.8249 - f1_m: 0.8143 - val_loss: 0.6174 - val_acc: 0.8153 - val_f1_m: 0.8172
Epoch 6/20
250/250 [==============================] - 241s 963ms/step - loss: 0.4158 - acc: 0.8783 - f1_m: 0.8722 - val_loss: 0.5371 - val_acc: 0.8393 - val_f1_m: 0.8425
Epoch 7/20
250/250 [==============================] - 243s 972ms/step - loss: 0.3157 - acc: 0.9051 - f1_m: 0.9025 - val_loss: 0.4755 - val_acc: 0.8624 - val_f1_m: 0.8650
Epoch 8/20
250/250 [==============================] - 247s 990ms/step - loss: 0.2449 - acc: 0.9260 - f1_m: 0.9221 - val_loss: 0.4640 - val_acc: 0.8654 - val_f1_m: 0.8708
Epoch 9/20
250/250 [==============================] - 246s 983ms/step - loss: 0.1910 - acc: 0.9423 - f1_m: 0.9406 - val_loss: 0.4344 - val_acc: 0.8689 - val_f1_m: 0.8755
Epoch 10/20
250/250 [==============================] - 246s 985ms/step - loss: 0.1569 - acc: 0.9554 - f1_m: 0.9538 - val_loss: 0.4055 - val_acc: 0.8849 - val_f1_m: 0.8893
Epoch 11/20
250/250 [==============================] - 248s 993ms/step - loss: 0.1352 - acc: 0.9590 - f1_m: 0.9587 - val_loss: 0.4028 - val_acc: 0.8854 - val_f1_m: 0.8894
Epoch 12/20
250/250 [==============================] - 248s 994ms/step - loss: 0.1191 - acc: 0.9658 - f1_m: 0.9635 - val_loss: 0.4213 - val_acc: 0.8949 - val_f1_m: 0.8963

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.
Epoch 13/20
250/250 [==============================] - 246s 984ms/step - loss: 0.0820 - acc: 0.9773 - f1_m: 0.9770 - val_loss: 0.3905 - val_acc: 0.8969 - val_f1_m: 0.8996
Epoch 14/20
250/250 [==============================] - 249s 996ms/step - loss: 0.0658 - acc: 0.9811 - f1_m: 0.9813 - val_loss: 0.3680 - val_acc: 0.9054 - val_f1_m: 0.9064
Epoch 15/20
250/250 [==============================] - 249s 995ms/step - loss: 0.0617 - acc: 0.9840 - f1_m: 0.9834 - val_loss: 0.3761 - val_acc: 0.9109 - val_f1_m: 0.9116

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 16/20
250/250 [==============================] - 249s 996ms/step - loss: 0.0509 - acc: 0.9876 - f1_m: 0.9875 - val_loss: 0.3563 - val_acc: 0.9084 - val_f1_m: 0.9099
Epoch 17/20
250/250 [==============================] - 249s 996ms/step - loss: 0.0487 - acc: 0.9862 - f1_m: 0.9865 - val_loss: 0.3637 - val_acc: 0.9119 - val_f1_m: 0.9157

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 18/20
250/250 [==============================] - 249s 997ms/step - loss: 0.0405 - acc: 0.9903 - f1_m: 0.9903 - val_loss: 0.3531 - val_acc: 0.9084 - val_f1_m: 0.9126
Epoch 19/20
250/250 [==============================] - 246s 982ms/step - loss: 0.0383 - acc: 0.9907 - f1_m: 0.9902 - val_loss: 0.3529 - val_acc: 0.9119 - val_f1_m: 0.9143
Epoch 20/20
250/250 [==============================] - 248s 993ms/step - loss: 0.0402 - acc: 0.9912 - f1_m: 0.9902 - val_loss: 0.3530 - val_acc: 0.9129 - val_f1_m: 0.9152

Epoch 00020: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.

Optimizer 7: SGD + Nesterov with momentum=0.9

In [19]:
#compile
model_sgdmones = get_model()
model_sgdmones.compile(loss='categorical_crossentropy', optimizer=SGD(momentum=0.9, nesterov=True), metrics=['acc',f1_m])
hist_sgdmones  = model_sgdmones.fit_generator(
    train_generator,
    steps_per_epoch = get_steps(nb_train_samples, batch_size),
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = get_steps(nb_validation_samples, batch_size),
    callbacks = callbacks_list
)
Epoch 1/20
250/250 [==============================] - 295s 1s/step - loss: 5.0288 - acc: 0.0442 - f1_m: 0.0000e+00 - val_loss: 4.3071 - val_acc: 0.1191 - val_f1_m: 0.0029
Epoch 2/20
250/250 [==============================] - 243s 974ms/step - loss: 3.2497 - acc: 0.2771 - f1_m: 0.0977 - val_loss: 1.9375 - val_acc: 0.5000 - val_f1_m: 0.3884
Epoch 3/20
250/250 [==============================] - 243s 971ms/step - loss: 1.6001 - acc: 0.5817 - f1_m: 0.4869 - val_loss: 1.0060 - val_acc: 0.7187 - val_f1_m: 0.6957
Epoch 4/20
250/250 [==============================] - 242s 969ms/step - loss: 0.8932 - acc: 0.7499 - f1_m: 0.7255 - val_loss: 0.7660 - val_acc: 0.7633 - val_f1_m: 0.7622
Epoch 5/20
250/250 [==============================] - 242s 968ms/step - loss: 0.5922 - acc: 0.8276 - f1_m: 0.8198 - val_loss: 0.5647 - val_acc: 0.8388 - val_f1_m: 0.8374
Epoch 6/20
250/250 [==============================] - 244s 976ms/step - loss: 0.4195 - acc: 0.8742 - f1_m: 0.8700 - val_loss: 0.4960 - val_acc: 0.8504 - val_f1_m: 0.8542
Epoch 7/20
250/250 [==============================] - 246s 984ms/step - loss: 0.3148 - acc: 0.9049 - f1_m: 0.9016 - val_loss: 0.4507 - val_acc: 0.8654 - val_f1_m: 0.8686
Epoch 8/20
250/250 [==============================] - 246s 985ms/step - loss: 0.2435 - acc: 0.9255 - f1_m: 0.9200 - val_loss: 0.4225 - val_acc: 0.8759 - val_f1_m: 0.8775
Epoch 9/20
250/250 [==============================] - 241s 965ms/step - loss: 0.1873 - acc: 0.9448 - f1_m: 0.9420 - val_loss: 0.4541 - val_acc: 0.8724 - val_f1_m: 0.8738

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.004999999888241291.
Epoch 10/20
250/250 [==============================] - 242s 966ms/step - loss: 0.1308 - acc: 0.9635 - f1_m: 0.9620 - val_loss: 0.3869 - val_acc: 0.8914 - val_f1_m: 0.8962
Epoch 11/20
250/250 [==============================] - 242s 966ms/step - loss: 0.1028 - acc: 0.9702 - f1_m: 0.9688 - val_loss: 0.3656 - val_acc: 0.8984 - val_f1_m: 0.8987
Epoch 12/20
250/250 [==============================] - 246s 983ms/step - loss: 0.0974 - acc: 0.9715 - f1_m: 0.9713 - val_loss: 0.3748 - val_acc: 0.8959 - val_f1_m: 0.8994

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0024999999441206455.
Epoch 13/20
250/250 [==============================] - 244s 975ms/step - loss: 0.0729 - acc: 0.9827 - f1_m: 0.9819 - val_loss: 0.3537 - val_acc: 0.9109 - val_f1_m: 0.9140
Epoch 14/20
250/250 [==============================] - 243s 973ms/step - loss: 0.0718 - acc: 0.9801 - f1_m: 0.9798 - val_loss: 0.3420 - val_acc: 0.9064 - val_f1_m: 0.9116
Epoch 15/20
250/250 [==============================] - 243s 972ms/step - loss: 0.0595 - acc: 0.9857 - f1_m: 0.9845 - val_loss: 0.3502 - val_acc: 0.9094 - val_f1_m: 0.9162

Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0012499999720603228.
Epoch 16/20
250/250 [==============================] - 242s 968ms/step - loss: 0.0550 - acc: 0.9859 - f1_m: 0.9858 - val_loss: 0.3450 - val_acc: 0.9079 - val_f1_m: 0.9125

Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0006249999860301614.
Epoch 00016: early stopping

acc / loss Plot

train acc

In [20]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['acc'])  
plt.plot(hist_adam.history['acc'])  
plt.plot(hist_nadam.history['acc']) 
plt.plot(hist_sgd.history['acc']) 
plt.plot(hist_sgdnes.history['acc']) 
plt.plot(hist_sgdmo.history['acc'])
plt.plot(hist_sgdmones.history['acc'])
plt.title('train. accuracy')  
plt.ylabel('accuracy')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='lower right')  

plt.show()

train loss

In [21]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['loss'])  
plt.plot(hist_adam.history['loss'])  
plt.plot(hist_nadam.history['loss']) 
plt.plot(hist_sgd.history['loss']) 
plt.plot(hist_sgdnes.history['loss']) 
plt.plot(hist_sgdmo.history['loss'])
plt.plot(hist_sgdmones.history['loss'])
plt.title('train. loss')  
plt.ylabel('loss')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

valid acc

In [22]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['val_acc'])
plt.plot(hist_adam.history['val_acc'])
plt.plot(hist_nadam.history['val_acc'])
plt.plot(hist_sgd.history['val_acc'])
plt.plot(hist_sgdnes.history['val_acc'])
plt.plot(hist_sgdmo.history['val_acc'])
plt.plot(hist_sgdmones.history['val_acc'])

plt.title('valid. accuracy')  
plt.ylabel('accuracy')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='lower right')  

plt.show()

valid loss

In [23]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['val_loss'])  
plt.plot(hist_adam.history['val_loss'])  
plt.plot(hist_nadam.history['val_loss']) 
plt.plot(hist_sgd.history['val_loss']) 
plt.plot(hist_sgdnes.history['val_loss']) 
plt.plot(hist_sgdmo.history['val_loss'])
plt.plot(hist_sgdmones.history['val_loss'])
plt.title('valid. loss')  
plt.ylabel('loss')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

train f1 score

In [24]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['f1_m'])  
plt.plot(hist_adam.history['f1_m'])  
plt.plot(hist_nadam.history['f1_m']) 
plt.plot(hist_sgd.history['f1_m']) 
plt.plot(hist_sgdnes.history['f1_m']) 
plt.plot(hist_sgdmo.history['f1_m'])
plt.plot(hist_sgdmones.history['f1_m'])
plt.title('train. f1_score')  
plt.ylabel('f1_score')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

valid f1 score

In [25]:
plt.figure(figsize=(10, 6))  

plt.plot(hist_rmsprop.history['val_f1_m'])  
plt.plot(hist_adam.history['val_f1_m'])  
plt.plot(hist_nadam.history['val_f1_m']) 
plt.plot(hist_sgd.history['val_f1_m']) 
plt.plot(hist_sgdnes.history['val_f1_m']) 
plt.plot(hist_sgdmo.history['val_f1_m'])
plt.plot(hist_sgdmones.history['val_f1_m'])
plt.title('valid. f1_score')  
plt.ylabel('f1_score')  
plt.xlabel('epoch')  
plt.legend(['rmsprop', 'adam', 'nadam', 'sgd', 'sgd+nesterov', 'sgd+momentum', 'sgd+nesterov+momentum'], loc='upper right')  

plt.show()

결론

  • 'sgd', 'sgd+nesterov' 는 너무 늦게 수렴하여 comp에 적절한 optimizer는 아닌것 같습니다.
  • 'rmsprop', 'adam'이 비교적으로 빠른 시간안에 높은 acc에 도달합니다.
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형

'competition' 카테고리의 다른 글

14th solution - 9%  (0) 2019.11.07
Pseudo Labelling  (0) 2019.11.06
Frequency Encoding이란?  (0) 2019.10.17
kaggle Top8% (681th of 8802) 🥉  (0) 2019.10.17
kaggle Top6% (95th of 1836)🥉  (0) 2019.10.17
반응형
lenet-5-using-keras-99-2
In [1]:
import gc
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

# 교차검증 lib
from sklearn.model_selection import StratifiedKFold,train_test_split
from tqdm import tqdm_notebook

#모델 lib
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, Model
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, AveragePooling2D
from keras import layers
from keras.optimizers import Adam,RMSprop

#모델
from keras.applications import VGG16, VGG19, resnet50

#경고메세지 무시
import warnings
warnings.filterwarnings(action='ignore')
Using TensorFlow backend.

LeNet-5

  • 최초로 산업에 성공적으로 적용된 CNN모델이다 image.png

data load

In [2]:
import os
os.listdir('../input/digit-recognizer')
datapath = '../input/digit-recognizer'
In [3]:
train =pd.read_csv(datapath+'/train.csv')
print(train.shape)
train.head()
(42000, 785)
Out[3]:
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 785 columns

In [4]:
test =pd.read_csv(datapath+'/test.csv')
print(test.shape)
test.head()
(28000, 784)
Out[4]:
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 784 columns

In [5]:
train_labels = train['label']
train = (train.iloc[:,1:].values).astype('float32')
test = test.values.astype('float32')
In [6]:
#Visualizing the data
sample = train[10, :].reshape(28,28)
plt.imshow(sample, cmap='gray')
plt.show()
print('label : ', train_labels[10])
label :  8

Preprocessing

  • LeNet-1 모델은 28x28의 이미지를 사용했습니다.
  • LeNet-5에서는 MNIST의 28x28 테스트 영상을 32x32 이미지의 중심에 배치하여 처리하였습니다.큰사이즈의 이미지 사용으로 인해 작은부분의 고려가 작은사이즈 보다 훨씬 더 고려되어 성능이 더욱 향상되었습니다.
In [7]:
train = train.reshape(42000, 28, 28, 1)
test= test.reshape(28000, 28, 28, 1)
# change shape using pad
train = np.pad(train, ((0,0),(2,2),(2,2),(0,0)), 'constant')
test = np.pad(test, ((0,0),(2,2),(2,2),(0,0)), 'constant')

print('train shape : ', train.shape)
print('test shape : ', test.shape)
train shape :  (42000, 32, 32, 1)
test shape :  (28000, 32, 32, 1)
In [8]:
# int64 -> float32 ,  scaling
train = train.astype('float32')/255
test = test.astype('float32')/255
X_train, X_val, y_train, y_val = train_test_split(train, train_labels, test_size=0.20, random_state=42)

#One-hot encoding the labels
print('X_train shape : ', X_train.shape)
print('X_val shape : ', X_val.shape)
print('y_train : ', y_train.shape)
print('y_val : ', y_val.shape)
y_train = to_categorical(y_train)
y_val = to_categorical(y_val)
print('y_train_to_categorical : ', y_train.shape)
print('y_val_to_categorical : ', y_val.shape)
X_train shape :  (33600, 32, 32, 1)
X_val shape :  (8400, 32, 32, 1)
y_train :  (33600,)
y_val :  (8400,)
y_train_to_categorical :  (33600, 10)
y_val_to_categorical :  (8400, 10)

Model

  • [32x32x1] INPUT
  • [28x28x6] CONV1: 6 5x5 filters at stride 1, pad 0
  • [6x14x14] Average POOL1: 2x2 filters at stride 2
  • [16x10x10] CONV2: 256 5x5 filters at stride 1, pad 0
  • [16x5x5] Average POOL2: 2x2 filters at stride 2
  • [120] FC6: 120 neurons
  • [84] FC7: 84 neurons
  • [10] FC8: 10 neurons (class scores)
  • LeNet-5모델은 Non-overlapping pooling을 사용했다. image
In [9]:
#lenet-5 model
model = Sequential()
#Conv layer 1
model.add(layers.Conv2D(filters=6, kernel_size=(5, 5),strides=1, activation='relu', input_shape=(32,32,1)))
#Pooling layer 1
model.add(AveragePooling2D(pool_size = 2, strides = 2))
#Conv Layer2
model.add(layers.Conv2D(filters=16, kernel_size=(5, 5),strides=1, activation='relu'))
#Pooling layer 2
model.add(AveragePooling2D(pool_size = 2, strides = 2))
model.add(layers.Flatten())
#FC Layer 3
model.add(layers.Dense(120, activation='relu'))
#FC Layer 4
model.add(layers.Dense(84, activation='relu'))
#FC Layer 5
model.add(layers.Dense(10, activation = 'softmax'))


# compile
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 28, 28, 6)         156       
_________________________________________________________________
average_pooling2d_1 (Average (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
average_pooling2d_2 (Average (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 120)               48120     
_________________________________________________________________
dense_2 (Dense)              (None, 84)                10164     
_________________________________________________________________
dense_3 (Dense)              (None, 10)                850       
=================================================================
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0
_________________________________________________________________

Train and predict

In [10]:
datagen = ImageDataGenerator(
        rotation_range=10,  
        zoom_range = 0.10,  
        width_shift_range=0.1, 
        height_shift_range=0.1)
In [11]:
patient = 4
callbacks_list = [
    ReduceLROnPlateau(
        monitor = 'val_loss', 
        #학습률을 절반으로 줄입니다.
        factor = 0.5, 
        #patience 만큼 val_loss가 감소하지 않으면 학습률을 줄입니다.
        patience = patient / 2, 
        #min Reduces learning
        min_lr=0.00001,
        verbose=1,
        mode='min'
    )]
In [12]:
%%time
epochs =30
batch_size = 64
history = model.fit_generator(datagen.flow(X_train,y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = (X_val,y_val),
                              steps_per_epoch=X_train.shape[0] // batch_size
                              ,callbacks=callbacks_list,verbose = 1)
Epoch 1/30
525/525 [==============================] - 18s 35ms/step - loss: 0.6216 - accuracy: 0.8020 - val_loss: 0.1816 - val_accuracy: 0.9450
Epoch 2/30
525/525 [==============================] - 15s 29ms/step - loss: 0.2233 - accuracy: 0.9317 - val_loss: 0.1542 - val_accuracy: 0.9527
Epoch 3/30
525/525 [==============================] - 15s 29ms/step - loss: 0.1758 - accuracy: 0.9464 - val_loss: 0.0894 - val_accuracy: 0.9735
Epoch 4/30
525/525 [==============================] - 15s 29ms/step - loss: 0.1458 - accuracy: 0.9562 - val_loss: 0.0741 - val_accuracy: 0.9777
Epoch 5/30
525/525 [==============================] - 15s 29ms/step - loss: 0.1208 - accuracy: 0.9624 - val_loss: 0.0798 - val_accuracy: 0.9755
Epoch 6/30
525/525 [==============================] - 16s 31ms/step - loss: 0.1092 - accuracy: 0.9667 - val_loss: 0.0671 - val_accuracy: 0.9792
Epoch 7/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0936 - accuracy: 0.9712 - val_loss: 0.0548 - val_accuracy: 0.9832
Epoch 8/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0918 - accuracy: 0.9715 - val_loss: 0.0622 - val_accuracy: 0.9814
Epoch 9/30
525/525 [==============================] - 15s 28ms/step - loss: 0.0811 - accuracy: 0.9744 - val_loss: 0.0531 - val_accuracy: 0.9832
Epoch 10/30
525/525 [==============================] - 15s 28ms/step - loss: 0.0761 - accuracy: 0.9767 - val_loss: 0.0439 - val_accuracy: 0.9862
Epoch 11/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0712 - accuracy: 0.9774 - val_loss: 0.0421 - val_accuracy: 0.9880
Epoch 12/30
525/525 [==============================] - 16s 30ms/step - loss: 0.0662 - accuracy: 0.9788 - val_loss: 0.0393 - val_accuracy: 0.9882
Epoch 13/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0635 - accuracy: 0.9799 - val_loss: 0.0487 - val_accuracy: 0.9852
Epoch 14/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0624 - accuracy: 0.9806 - val_loss: 0.0409 - val_accuracy: 0.9879

Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 15/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0490 - accuracy: 0.9850 - val_loss: 0.0346 - val_accuracy: 0.9899
Epoch 16/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0445 - accuracy: 0.9861 - val_loss: 0.0357 - val_accuracy: 0.9887
Epoch 17/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0413 - accuracy: 0.9871 - val_loss: 0.0367 - val_accuracy: 0.9890

Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 18/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0405 - accuracy: 0.9870 - val_loss: 0.0279 - val_accuracy: 0.9913
Epoch 19/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0370 - accuracy: 0.9885 - val_loss: 0.0314 - val_accuracy: 0.9907
Epoch 20/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0355 - accuracy: 0.9885 - val_loss: 0.0278 - val_accuracy: 0.9914

Epoch 00020: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 21/30
525/525 [==============================] - 15s 28ms/step - loss: 0.0331 - accuracy: 0.9895 - val_loss: 0.0283 - val_accuracy: 0.9923
Epoch 22/30
525/525 [==============================] - 15s 28ms/step - loss: 0.0321 - accuracy: 0.9897 - val_loss: 0.0282 - val_accuracy: 0.9915

Epoch 00022: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 23/30
525/525 [==============================] - 16s 30ms/step - loss: 0.0310 - accuracy: 0.9901 - val_loss: 0.0264 - val_accuracy: 0.9924
Epoch 24/30
525/525 [==============================] - 15s 29ms/step - loss: 0.0274 - accuracy: 0.9917 - val_loss: 0.0263 - val_accuracy: 0.9919
CPU times: user 9min 34s, sys: 49.4 s, total: 10min 23s
Wall time: 7min 42s
In [13]:
#predict
submission =pd.read_csv(datapath+'/sample_submission.csv')
pred = model.predict(test)
pred = np.argmax(pred,axis = 1)
submission['Label'] = pred
submission.to_csv('submission.csv',index=False)
submission.head()
Out[13]:
ImageId Label
0 1 2
1 2 0
2 3 9
3 4 0
4 5 3

Acc/Loss plot

In [14]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

epochs = range(len(acc))

plt.plot(epochs, acc, label='Training acc')
plt.plot(epochs, val_acc, label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.ylim(0.9,1)
plt.show()
In [15]:
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(epochs, loss, label='Training loss')
plt.plot(epochs, val_loss, label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.ylim(0,0.5)
plt.show()

conclusion

image

  • 간단한 Renet-5 모델로 정확도 99%이상을 달성했습니다.
  • 교차검증을 통한 앙상블로 정확도를 더 높일 수 있을 것 같습니다
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
반응형
반응형

0123

9-1 오늘은 CNN 구조에 대해 알아보겠습니다.
9-2 AlexNet, VGG, GoogLeNet, ResNet에 대해 알아보고 다른 CNN 구조에 대해서도 알아보겠습니다.
9-3 Lenet-5은 산업에 성공적으로 적용된 최초의 conv모델입니다.  숫자인식에서 크게성공하여 우체국 산업에서 적용되었습니다.
9-4 Subsampling에서 Average pooling을 썻으며 non overlapping polling이 적용되었습니다. non overlapping polling이란 필터의 크기와 stride의 크기가 같아 중복되지 않고 polling하는 기법입니다. 자세한건 LeNet-5구현을 참조하시기 

바랍니다.

012345678910111213

9-5 이번에는 AlexNet에 대하여 알아보겠습니다. AlexNet 12년에 등장해서 딥러닝을 부흥시켰습니다. 그이유는 

최초로 ImageNet대회에서 CNN으로 우승했기 때문입니다. 구조는 LeNet-5와 비슷하고 레이어만 증가했습니다.
9-6 227x227x3 이미지에 11x11사이즈에 stride가4인 96개의 필터를 적용하면 output volume 사이즈는 어떻게 될까요? 먼저 높이와 너비의 계산 공식은 (전체 이미지 크기- 필터크기 + 2*padding size) / stride+1입니다. 
9-7 정답은 55x55x96(필터갯수) 입니다. 그렇다면 이 Layer의 파라미터 수는 몇개일까요? 
9-8 파라미터의 갯수는 공식은 필터사이즈 x 입력depth x 필터갯수 입니다. 그렇다면 (11*11*3)*95 = 35,000개가 됩니다.
9-9 그렇다면 두번째 레이어를 살펴볼까요? 두번째는 풀링 레이어 입니다. Alexnet에서는 maxpooling을 사용합니다 output volume 사이즈는 어떨까요?
9-10 앞의 공식을 적용하면 27x27x96이 됩니다 그렇다면 파라미터의 갯수는 몇개일까요?
9-11 0개입니다. 당연하겠죠? pooling layer는 단순 downsampling로 파라미터가 없습니다.
9-12 이렇게 conv와 pooling가 반복되어 AlexNet을 구성합니다
9-13 AlexNet의 전체구조입니다. 그림에 input 사이즈가 224라고 나와있지만 실제로는 227입니다. 총 5개의 conv와 

3개의 max pool 2개의 FC로 구성됩니다.
9-14 AlexNet을 디테일하게 살펴보면 ReLU를 처음 사용하였습니다. 그리고 Norm으로 local response normalization사용하였는데 현재는 사용하지 않습니다. 많은 augmentation을 사용하였으며 0.5의 dropout을 사용했습니다. 학습률을 0.01에서 검증 auc가 평평해지면 10으로 나누어 추가적인 학습을 진행했습니다. 그외는 표에 나와있는 것과 같습니다.
9-15 AlexNex이 개발되었을 시대에는 컴퓨팅파워가 낮아 3GB의 GPU밖에 없어 AlexNet으로 ImageNet을 학습시킬 수 없었습니다. 그래서 model parallel을 통해 두개의 GPU을 연결해 병렬로 처리했습니다. 그래서 자세히 그림을 

보시며 conv1에서 depth가 96이아니라 48입니다.
9-16 CONV1, CONV2, CONV4, CONV5의 경우 같은 GPU끼리 feature maps을 연결하고있으며
9-17 CONV3, FC6, FC7, FC8의 경우는 2개의 GPU가 모든 feature maps을 연결하고있습니다.

012345678910111213

9-19 앞서 본 AlexNet은 12년도 ImageNet에서 우승했습니다. 그 뒤에 이어서 ZFNet이 13년도 ImageNet우승을 

차지했습니다. 
9-20 ZFNet 은 AlexNet과 유사하지만 11x11 with stride4의 필터를 7x7 with stride 2로 바꾸었으며 기존의 

CONV3, 4, 5의 필터갯수를 384, 384, 256에서 512, 1024, 512로 바꾸었습니다.
9-21 다음은 14년도 ImageNet 우승작을보겠습니다. 먼저 ImageNet 2등과 localization대회에서 2등을 차지한 VGGNet의 구조를 알아보겠습니다. localization대회란 기본적으로 "이미지에 고양이가 있는지?" 를 분류하는 것 뿐만 아니라 정확히 고양이가 어디에 있는지 네모박스를 그리는 것입니다.
9-22 VGGNet의 특징은 더 깊어지고 더 작은 필터를 사용했다는것입니다. 이웃픽셀을 포함할 수 있는 가장작은 필터인 3x3 필터를 사용했습니다. 그리고 작은 필터를 유지하고 주기적으로 pooling을 수해했습니다.
9-23 왜 작은 필터를 사용할까요? 우선 필터의 크기가 작으면 파라미터의 수가 적고 depth를 더 키울 수 있습니다.
9-24 3x3 with stride1 CONV를 3번 쌓게 되면 effective receptive field는 무엇일까요? 
9-25 정답은 7x7입니다. 왜 이렇게 되는지 살펴보겠습니다.
보충 첫 번째 layer에서 하나의 픽셀은 CONV를 통해 3x3의 정보를 가지게 됩니다 두 번째 레이어에서 하나의 픽셀은 3x3의 정보를 가진 상태에서 또다시 3x3의 CONV를 통해 5x5의 정보를 가집니다. 이런식으로 3x3의 CONV를 3번 쌓게되면 7x7의 effective receptive field 가지게 됩니다.
9-26 effective receptive field을 이해했다면 왜 이렇게 할까요? 그 이유는 좀 더 깊은층을 쌓을 수 있으며 비선형함수를 더 쌓기때문에 표현이 자유로워 집니다. 또, 파라미터의 갯수가 줄어듭니다.

앞서 파라미터의 갯수는 필터의크기 x 입력의 depth x 필터의 갯수라고 했죠? 그렇다면 3x3xCxC가 세번 쌓이기 때문에 3*(3*3*C*C) = 27*C*C가 됩니다. 7x7의 파라미터 갯수는 어떻게 될까요? 7*7*C*C = 49*C*C로 두배가까이 차이가 납니다. 왜 입력의 depth와 필터의 갯수가 같아야하나? 왜 같은C인가? 의문이 있다면 같은 입력과 출력을 가져야만 effective receptive field를 가질 수 있기때문입니다. 이부분은 다른 포스팅을 통해 한번 더 다루겠습니다.
9-27 VGGNet의 전체 구조입니다.
9-28 Total memory는 한번의 forward pass시 필요한 메모리를 계산한것입니다. 각노드가 4bites의 메모리를 차지하기 때문에 약 100mb입니다. 전체 메모리가 5gb라면 이미지 하나당 100mb이므로 50장밖에 처리할 수 없습니다. 

그리고 전체 파라미터의 개수는 1억3800만개 이며 AlexNet경우에는 6000만개 였습니다.
9-29 앞부분에서 메모리가 많이 들며 FC에서 파라미터 개수가 많습니다 그래서 최근에는 FC를 없애는 추세입니다.
9-30 CONV를 지칭하는 방법이며 기억하시면 논문을 읽는데 도움이 될 것 같습니다.
9-31 VGG의 마지막 FC-layer FC7은 아주 좋은 feature representation 특징 추출이 잘되며 다른 task에도 일반화가 뛰어납니다. 학습절차는 AlexNet과 비슷하며 LRN은 사용하지 않았고 최종결과 중 BEST를 앙상블 했습니다.

01234567891011121314
GoogleNet Part1

9-32 이제 14년도  ImageNet우승 모델인 GooGleNet을 살펴보겠습니다.
9-33 GooGleNet은 효율적인 계산력을 가진 깊은 모델입니다. 총 22개의 layer을 가지고 효율적은 Inception이라는 모듈을 가집니다. FC layer은 없으며(많은 파라미터 때문), AlexNet보다 12배 적은 500만개의 파라미터를 가집니다.
9-34 인셉션 모듈은 network 안에 network 즉 good local network topology를 만들다는 생각에서 출발했습니다. 
9-35 Local network 를 인셉션 모듈이라 하는데 인셉션 모듈 내에는 동일한 입력은 받는 서로다른 다양한 필터들이

병렬로 존재합니다. 1x1 / 3x3 / 5x5 CONV에 Pooling도 있습니다. 여기에서는3x3 pooling이죠 각 레이어에서 각각의

출력 값들이 나오는데 그 출력들을 모두 Depth 방향으로 합칩니다(concatenate). 그렇게 합치면 하나의 tensor로 출력이 결정되고 이 하나의 출력을 다음 레이어로 전달하는 것입니다.
9-36 지금까지는 다양한 연산을 수행하고 이를 하나로 합쳐준다는아주 단순한 방식(naive way)을 살펴봤습니다. 그렇다면 이 network의 문제점은 뭘까요?
9-37 문제는 바로 계산 비용에 있습니다. 
9-38 1x1 conv with 128 filters의 output 사이즈는 몇일까요?
9-39 28x28x128입니다.
9-40 다른 filter의 output 사이즈는 어떻게될까요?
9-41 input 사이즈의 크기에서 필터의 갯수에 따라 output사이즈를 가집니다. 
9-42 모든 필터를 합쳤을 때 output사이즈는 어떨까요?
9-43 depth 기준으로 합친다고 했으니 28x28x672입니다. 
9-44 이 레이어들의 계산량을 한번 살펴봅시다 첫 번째 예시로 1 x 1 conv를 살펴봅시다. 1 x 1 conv는각 픽셀마다

1 x 1 x 256 개의 내적연산을 수행합니다. 따라서 픽셀 당 256번의 곱셈 연산이 수행되는 것이죠(Conv Ops:맨 뒤의 256) 그리고 픽셀이 총 28 x 28 이므로처음 "28  x 28" 이 여기에 해당합니다. 그리고 이런 연산을 수행하는 필터가 총 128개 있으므로 1 x 1 conv에서의 전체 연산량은28 x 28 x 128 x 256 입니다.  똑같이 나머지 레이어를 계산하면 8억5천4백만 flops가 됩니다. 매우 큽니다.
9-45 또한 pooling layer 또한 문제가 되는데 풀pooling layer 입력은 depth를 그대로 가져오기때문에 무조건 입력의 depth보다 커질 수 밖에 없습니다.
9-46 이 엄청난 계산량을 해결해주기 위해 bottleneck layer가 나왔습니다. bottleneck layer는 feature의 depth를 

줄여주는데 자세히 살펴보겠습니다.

0123456789101112
GoogleNet Part2

9-47 1x1 conv를 다시 한번 살펴봅시다. 
9-48 1x1 conv는 각spatial location에서만 내적을 수행합니다. 그러면서 depth만 줄일 수 있습니다. 입력의 depth를더 낮은 차원으로 projection 하는 것입니다. Input feature map들 간의선형결합(linear combination) 이라고 할 수 있습니다. 주요 아이디어는 바로 입력의 depth를 줄이는 것입니다.
9-49 왼쪽 그림은 초기의 인셉션 모듈이고 오른쪽은 인셉션 모듈에 bottleneck layer 적용한 모습입니다.
9-50 3x3, 5x5 CONV 전에 bottleneck layer가 추가되었고 3X3 POOL 뒤에 bottleneck layer가 추가되었습니다.
9-51 자 그럼 필터들의 output사이즈와 계산량을 구해보겠습니다. 결론을 말씀드리면 초기의 인셉션 모듈보다 두배  이상 계산량이 절감되었음을 볼 수 있습니다.
9-52 이러한 인셉션 모듈을 쌓아 만든 모델이 GoogLeNet입니다.
9-53 전체 GoogLeNet을 알아보겠습니다. 시작은 기존의 models과 비슷하게 CONV- POOL 구조를 가집니다.
9-54 그다음 중간에 인셉션 모듈을 쌓습니다.
9-55 마지막에 Classifier output이 있습니다.
9-56 GoogLeNet에서는 계산량이 많은 FC-layer를 대부분 걷어냈고 파라미터가 줄어들어도 모델이 잘 동작함을 

확인했습니다.
9-57 그리고 여기 보시면 추가적인 줄기가 뻗어있는데이들은 보조분류기(auxiliary classifier) 입니다.이것들은 단지 작은 미니 네트워크들입니다. Average pooling과 1x1 conv가 있고 FC-layer도 몇개 붙습니다.

그리고 SoftMax로 1000개의 ImageNet class를 구분합니다. 그리고 실제로 이 곳에서도ImageNet trainset loss를

계산합니다. 네트워크의 끝에서 뿐만 아니라 이 두 곳에서도 Loss를계산하는 이유는 네트워크가 깊기 때문입니다.

보조분류기를 중간 레이어에 달아주면 추가적인 그레디언트를 얻을 수 있고, 따라서 중간 레이어의 학습을 도울 수 있습니다.
9-58 전체 아키텍쳐의 모습입니다.가중치를 가진 레이어는 총 22개입니다. 각 Inception Modul 은 1x1/3x3/5x5 conv layer를병렬적으로 가지고있습니다.
9-59 지금까지 14년도 ImageNet 우승 모델인 GoogLeNet을 살펴봤습니다.

0123456
ResNet Part1

9-60 다음은 15년도 ImageNet모델인 ResNet을 살펴보겠습니다. 깊이의 혁명이라 불릴 정도로 ResNet모델의 depth는 기존의 모델보다 매우 깊습니다.
9-61 ResNet은 residual connections을 사용한 매우 깊은 모델입니다. 이제 ResNet에 대해 알아보겠습니다.
9-62 만약에 CONV를 깊게 쌓는다면 어떨까요? 예를들어 VGGNet의 CONV를 엄청 늘리면 성능이 어떻게될까요? 
9-63 그림은 20레이어와/56의 레이어의 모델 성능을 나타냅니다. 오른쪽의 test error의 경우는 56레이어가 20레이어보다 안좋습니다. 더 깊은 네트워크가 안좋을 수 있구나 생각할 수 있습니다. 
9-64 그러나 training error를 보면 조금 이상합니다. 깊은 네트워크가 있다면 당연히 엄청나게 많은 파라미터로 인해 

오버피팅 되겠구나 예상할텐데 training error또한 20레이어보다 안좋고 이는 오버피팅 때문이 아니였습니다. 
9-65 Resnet저자들의 가설은 모델학습시 최적화에 문제가 있다고 생각하고 모델이 깊어질수록 최적화가 어렵다고 

생각했습니다. 
9-66 그들은 "모델이 더 깊다면 적어도 더 얕은 모델만큼은성능이 나와야 하지 않은지"라고 추론했습니다. 가령 이런 

해결책을 생각해 볼 수 있습니다. 우선 더 얕은 모델의가중치를 깊은 모델의 일부 레이어에 복사합니다. 그리고 나머지 

레이어는 identity mapping을 하는 것이죠(input을 output으로 그냥 내보냄) 이렇게 구성하면 shallower layer 만큼의 

성능을 나와야겠죠 Deeper Model의 학습이 제대로 안되더라도적어도 Shallow Model 만큼의 성능은 보장됩니다. 

그렇다면 이 motivation을 우리가 만들 모델에 녹이려면어떻게 모델 아키첵쳐를 디자인해야할까요?

0123456789101112
ResNet Part2

9-67 그들의 아이디어는 레이어를 단순하게 쌓지 않는 것입니다. 
9-68 레이어가 직접 "H(x)"를 학습하기 보다이런 식으로 "H(x) - x" 를 학습할 수 있도록 만들어줍니다. 이를 위해서 

Skip Connection을 도입하게 됩니다.오른쪽의 고리모양 보이시죠 오른쪽의 Skip Connection은 가중치가 없으며 입력을 identity mapping으로 그대로 출력단으로 내보냅니다. 그러면 실제 레이어는 변화량(delta) 만 학습하면 됩니다. 

입력 X에 대한 잔차(residual) 이라고 할 수 있죠Direct mapping 대신 Residual mapping을 하도록이런 식으로 블럭을 

쌓는 것이죠 최종 출력 값은 "input X + 변화량(Residual)" 입니다. 이 방법을 사용하면 학습이 더 쉬워집니다.

가령 Input = output 이어야 하는 상황이라면 레이어의 출력인 F(x)가 0 이어야 하므로(residual = 0) 모든 가중치를 

0으로 만들어주면 그만입니다. 결론적으로 H(x)의 값을 훈련시키기 어려우니 x+f(x)를 통해 x의 변화량을 측정하는 것이 

더 효율적이다라는 이야기입니다. 그렇다면 이 아이디어가 아까 위에 나온 아디디어 처럼 Shallow Model 만큼의 성능은 보장될 수 있습니다. 만약 그전 Shallow Model layer의 가중치를 업로드하고 그전보다 성능이 안좋다면 나머지 layer에 가중치가 0이 되어 Shallow Model의 성능은 나온다는 이야기입니다.
9-69 여기 전체 ResNet 아키텍쳐를 보실 수 있습니다. 하나의 Residual blocks는 두 개의 3x3 conv layers로이루어져 

있습니다. 이렇게 구성해야 잘 동작하는 것으로 알려져있습니다.
9-70 그리고 주기적으로 필터를 두배 씩 늘리고stride 2를 이용하여 Downsampling을 수행합니다.
9-71 시작은 CONV layer로 시작합니다.
9-72 그리고 네트워크의 초반에는 Conv Layer가 추가적으로 붙고네트워크의 끝에는 FC-Layer가 없습니다. 대신

Global Average Pooling Layer 를 사용합니다.GAP는 하나의 Map 전체를 Average Pooling 합니다.
9-73 전체 depth에 따라 34, 50, 101, 152모델이 있습니다.
9-74 ResNet의 경우 모델 Depth가50 이상일 때 Bottleneck Layers를 도입합니다. 이는 GoogLeNet에서 사용한 방법과 유사합니다. Bottleneck Layer는 1x1 conv를 도입하여초기 필터의 depth를 줄여줍니다. 
9-75 가령 입력이 28x28x256 일때 1x1 conv를 적용하면 depth가줄어들어서 28x28x64 가 됩니다. 이로인해 

3 x 3 conv의 계산량이 줄어듭니다. 그리고 뒤에 다시 1x1 conv를 추가해서 Depth를 다시 256으로 늘립니다.

Deeper ResNet은 이런 구조를 취합니다.
9-76 ResNet의 training 방법은 우선 CONV 이후 Batch Normalization을 사용했으며 Xavier/2 인 가중치를 사용했으며 나머지는 그림과 같이 적용했습니다.
9-77 ResNet은 다른모델과 압도적인 차이를 벌리며 모든 대회에서 우승했습니다.
9-78 그리고 최초로 ImageNet에서 인간을 이긴 모델이됩니다.
9-79 이시점에서 ImageNet대회는 마감하게됩니다.

0123456

9-80 자 그럼 모델 별 complexity를 빠르게 한번 살펴보겠습니다.  왼쪽 그래프는 모델의 성능 별로 정렬해 보았습니다 .Top-1 Accuracy가 기준이고 높을수록 좋은 모델입니다. 자 이제는 오른쪽 그래프를 살펴봅시다. 연산량과 메모리가 추가되었습니다. Y축은 top-1 Accuracy이고 높을수록 좋습니다. X축은 연산량을 나타냅니다. 오른쪽으로 갈수록 연산량이 많습니다.원의 크기는 메모리 사용량입니다. 원이 클수록 덩치가 큰 모델이죠.
9-81 거의 모두 우리가 지금까지 배운 모델이거나 조금 변형된모델들 입니다. 가령 GoogLe-inception을 보시면이 

모델은 버전별로 V2, V3 등이 있는데 가장 좋은 모델은 바로 여기V4 입니다. ResNet + Inception 모델입니다.
9-82 VGGNet의 경우는 메모리도 크고 연산량도 많이 최근에는 사용하지 않습니다.
9-83 GoogLeNet이 가장 효율적인 네트워크입니다. x축에서거의 왼쪽에 있죠 뿐만 아니라 메모리 사용도 적습니다.
9-84 AlexNet의 경우 연산량은 적으나 메모리도 크고 정확도 또한 낮습니다 VGG와 마찬가지로 사용하지 않습니다.
9-85 ResNet으 적당한 연산량과 메모리를 가지며 높은 정확도를 나타내 최근에도 사용되며 많이 개발되고 있는 모델입니다.
9-86 왼쪽 그래프는 forward pass 시간입니다. 단위는 ms 인데VGG가 제일 오래걸립니다. 200ms으로 초당 5정도 처리할 수 있겠군요

012345678910

9-87 이제 다른 CNN모델 구조에 대해 보겠습니다.
9-88 Network in Network 입니다.2014년에 나온 논문입니다. Network in Network의 기본 아이디어는 네트워크 안에 작은 네트워크를 삽입하는 것이죠 각 Conv layer 안에 MLP(Multi-Layer Perceptron)를 쌓습니다. FC-Layer 몇 개를 쌓는 것이죠맨 처음에는 기존의 Conv Layer가 있고 FC-Layer를 통해 abstract features를 잘 뽑을수 있도록 합니다 단순히 conv filter만 사용하지 말고, 조금 더 복잡한 계층을만들어서 activation map을 얻어보자는 아이디어입니다. NIN에서는 기본적으로는 FC-Layer를 사용합니다. 이를 1x1 conv layer 라고도 합니다. Network in Network는 GoogLeNet과 ResNet보다 먼저 Bottleneck 개념을 정립했기 때문에 아주 의미있는 아이디어입니다. 또한 GoogLeNet은 NIN와 구조는 조금 다르지만 NIN의 "철학적인 영감" 을 받았습니다.
9-89 2016년 ResNet의 저자들은 ResNet의 블록 디자인을 향상시킨 논문을 발표했습니다. 이 논문에서는 ResNet block path를 조절하였습니다. 새로운 구조는 direct path를 늘려서 정보들이 앞으로 더욱 더 잘 전달되고 Backprob도 더 잘 될 수 있게 개선했습니다. 이 새로운 Block 구조 덕분에 더 좋은 성능을 얻을 수 있었습니다.
9-90 Wide Residual Networks입니다. 기존의 ResNet 논문은 깊게 쌓는 것에 열중했지만 사실 중요한 것은 depth가 아닌 residual 이라고 주장했습니다. Residual Connection이 있다면 네트워크가 굳이 더 깊어질 필요가 없다고 주장했습니다. 그래서 그들은 residual block 을 더 넓게 만들었습니다. 즉 conv layer의 필터를 더 많이 추가했습니다. 가령 기존의 ResNet에는 Block 당 F개의 filter만 있었다면 대신에 F * K 개의 필터로 구성했습니다. 각 레이어를 넓게 구성했더니 50 레이어만 있어도 152 레이어의 기존 ResNet보다 성능이 좋다는 것을 입증했습니다. 그리고 네트워크의 Depth 대신에 filter의 갯수를 늘리면 추가적인 이점이 있는데, 계산 효율이 증가합니다. 왜냐하면 병렬화가 더 잘되기 때문입니다. 네트워크의 Depth를 늘리는 것은 sequential한 증가이기 때문에 conv의 필터를 늘리는(width) 편이 더 효율적입니다.
9-91 그리고 비슷한 시점에 등장한 또 하나의 논문이 있습니다. 바로 ResNeXt 입니다. 이 논문 또 ResNet의 저자의 논문입니다.계속해서 ResNet 구조를 밀고 있습니다. 여기에서도 계속 residual block의 width를 파고듭니다. filter의 수를 늘리는 것이죠 각 Residual block 내에 "다중 병렬 경로" 추가합니다. 이들은 pathways의 총 합을 cardinality라고 불렀습니다. 하나의 bottleneck ResNet block은 비교적 작지만 이런 얇은 blocks을 병렬로 여러개 묶었습니다. 또한 여러 Layers를 병렬로 묶어준다는 점에서Inception Module과도 연관있습니다.
9-92  Stochastic Depth 이라는 논문이 있습니다. 주제는 Depth 이죠 네트워크가 깊어지면 깊어질수록 Vanishing gradient 문제가 발생합니다. 깊은 네트워크에서는 그레디언트를 뒤로 전달할수록점점 그레디언트가 작아지는 문제가 있습니다. 기본 아이디어는 Train time에 레이어의 일부를 제거합니다.short network면 트레이닝이 더 잘 될 수 있기 때문입니다.일부 네트워크를 골라서 identity connection으로 만들어버립니다. 이렇게 shorter network를 만들어서 Train하면그레디언트가 더 잘 전달될 수 있겠죠 아주 효율적인 방법이 될 수 있습니다. Dropout과 유사합니다. 그리고 Test time에서는 full deep network를 사용합니다.
9-93 지금까지 소개시켜 드린 방법들은 ResNet 아키텍쳐를 개선하고자 노력했습니다. "Beyond ResNet"을 지향하는 방법들도 있습니다. Non-ResNets 중에도 ResNet과 견줄만한 성능의 모델들이 있습니다. 그 중 하나는 FractalNet 입니다. 아주 최근에 나왔습니다.그들은 residual connection이 쓸모없다고 주장합니다. FractalNet 아키텍쳐에서는 residual connection이 전혀 없습니다. 그들은 shallow/deep network의 정보 모두를 잘 전달하는 것이 중요하다고 생각했습니다. FractalNet에서는 shllow/deep 경로를 출력에 모두 연결합니다. FractalNet에는 다양한 경로가 존재하지만 Train time에는Dropout처럼 일부 경로만을 이용해서 Train 합니다. 그리고 Test time에는 full network를 사용합니다. 그들은 FractalNet의 좋은 성능을  입증했습니다.
9-94 DenseNet에는 Dense Block 이란 것이 있습니다. 여기 보시면 한 레이어가 그 레이어 하위의모든 레이어와 연결되어 있습니다.  Network의 입력이미지가 모든 Layer의 입력으로 들어갑니다. 그리고 모든 레이어의 출력이 각 레이어의 출력과 Concat 됩니다. 그리고 이 값이 각 Conv layer의 입력으로 들어갑니다. 이 과정에서 dimention을 줄여주는 과정이 포함됩니다.이들은 Dense Connection이 Vanishing gradient 문제를 완화시킬 수 있다고 주장합니다. 그리고 Dense connection은 Feature를 더 잘 전달하고더 잘 사용할 수 있게 해줍니다. 각 레이어의 출력이 다른 레이어에서도여러번 사용될 수 있기 때문입니다.
9-95  "squeeze layer"는 1x1 필터들로 구성되고, 이 출력 값이 1x1/3x3 필터들로 구성되는 "expand layer"의 입력이 됩니다. SqueezeNet는 ImageNet에서 AlexNet 만큼의Accuracy를 보이지만 파라미터는 50배 더 적었습니다. 그리고 SqueezeNet을 더 압축하면 AlexNet보다500배 더 작아지게 됩니다. SqueezeNet의 용량은 0.5Mb 밖에 안됩니다
9-96,97 지금까지 CNN의 구조에 대해 알아봤습니다. 현재 Efficientnet 시리즈 등 많은 모델이 나왔으니 논문을 읽어보시면 도움이 될 것 같습니다.




반응형

'데이터분석 > vision' 카테고리의 다른 글

VGGNet using keras  (0) 2019.11.07
LRN(Local Response Normalization) 이란 무엇인가?(feat. AlexNet)  (0) 2019.11.07
AlexNet using keras  (0) 2019.11.06
LeNet-5 using keras  (0) 2019.10.30
Lecture 7: Training Neural Networks, part I  (0) 2019.10.17

+ Recent posts