TigerCow.Door

4_Overfitting_and_Underfitting


안녕하세요. 문범우입니다.

이번 포스팅에서 다뤄볼 내용은, 텐서플로우 튜토리얼의 4번째 overfitting and underfitting 입니다.


In [41]:
# TensorFlow and tf.keras
# 텐서플로우와 keras를 import한다. 이떄 tensorflow는 tf라는 별칭으로 사용할 것임.
import tensorflow as tf
from tensorflow import keras

# Helper libraries
# numpy와 matplotlib을 사용한다.
import numpy as np
import matplotlib.pyplot as plt
# jupyter notebook에서 matplotlib을 사용하기 위한 매직커맨드
%matplotlib inline

print("사용되는 tensorflow의 버전:",tf.__version__)
사용되는 tensorflow의 버전: 1.9.0

ㄱ. 데이터준비

이번에 사용될 데이터는 지난번 test classification에서 사용되었던 IMDB 영화 데이터입니다.

In [43]:
NUM_WORDS = 10000

(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS)
In [44]:
# 0번째 데이터의 값 확인해보기
print(train_data[0][0:5],". . .",train_data[0][-5:])
[1, 14, 22, 16, 43] . . . [16, 5345, 19, 178, 32]
In [45]:
def multi_hot_sequences(sequences, dimension):
    # Create an all-zero matrix of shape (len(sequences), dimension)
    # sequences의 길이만큼 행을 만들고, dimension만큼 열을 만든다.
    results = np.zeros((len(sequences), dimension))
    # sequence
    for i, word_indices in enumerate(sequences):
        # i번째의 데이터에 대해서 포함되어 있는 단어의 숫자값을 인덱스로 하여 1값으로 가져간다.
        results[i, word_indices] = 1.0  # set specific indices of results[i] to 1s
    return results


train_data = multi_hot_sequences(train_data, dimension=NUM_WORDS)
test_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)

우리는 이번에 overfitting에 대해 공부해볼 예정입니다.

위의 multi_hot_sequences 함수는, 모델이 훈련데이터셋에 대해서 보다 빨리 overfitting이 되도록 합니다.

In [46]:
# 0번째 데이터의 값 확인해보기
print(train_data[0][0:5],". . .",train_data[0][-5:])
[0. 1. 1. 0. 1.] . . . [0. 0. 0. 0. 0.]
In [48]:
# 0번째 데이터 값을 그래프로 확인해보기
plt.plot(train_data[0])
Out[48]:
[<matplotlib.lines.Line2D at 0xb28f28550>]

ㄴ. 기준 모델 만들기

오버피팅을 확인해보기 위해 먼저 기준 모델을 만들어본다.

In [49]:
baseline_model = keras.Sequential([
    # `input_shape` is only required here so that `.summary` works. 
    keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

baseline_model.compile(optimizer='adam',
                       loss='binary_crossentropy',
                       metrics=['accuracy', 'binary_crossentropy'])

baseline_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 16)                160016    
_________________________________________________________________
dense_1 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 17        
=================================================================
Total params: 160,305
Trainable params: 160,305
Non-trainable params: 0
_________________________________________________________________
In [50]:
# 기준 모델을 훈련해보고 측정해본다.
baseline_history = baseline_model.fit(train_data,
                                      train_labels,
                                      epochs=20,
                                      batch_size=512,
                                      validation_data=(test_data, test_labels),
                                      verbose=2)
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 9s - loss: 0.4707 - acc: 0.8128 - binary_crossentropy: 0.4707 - val_loss: 0.3284 - val_acc: 0.8775 - val_binary_crossentropy: 0.3284
Epoch 2/20
 - 7s - loss: 0.2428 - acc: 0.9134 - binary_crossentropy: 0.2428 - val_loss: 0.2843 - val_acc: 0.8873 - val_binary_crossentropy: 0.2843
Epoch 3/20
 - 6s - loss: 0.1789 - acc: 0.9366 - binary_crossentropy: 0.1789 - val_loss: 0.2911 - val_acc: 0.8858 - val_binary_crossentropy: 0.2911
Epoch 4/20
 - 6s - loss: 0.1431 - acc: 0.9512 - binary_crossentropy: 0.1431 - val_loss: 0.3177 - val_acc: 0.8778 - val_binary_crossentropy: 0.3177
Epoch 5/20
 - 7s - loss: 0.1188 - acc: 0.9606 - binary_crossentropy: 0.1188 - val_loss: 0.3433 - val_acc: 0.8732 - val_binary_crossentropy: 0.3433
Epoch 6/20
 - 8s - loss: 0.0975 - acc: 0.9696 - binary_crossentropy: 0.0975 - val_loss: 0.3757 - val_acc: 0.8680 - val_binary_crossentropy: 0.3757
Epoch 7/20
 - 7s - loss: 0.0786 - acc: 0.9776 - binary_crossentropy: 0.0786 - val_loss: 0.4244 - val_acc: 0.8616 - val_binary_crossentropy: 0.4244
Epoch 8/20
 - 8s - loss: 0.0623 - acc: 0.9832 - binary_crossentropy: 0.0623 - val_loss: 0.4540 - val_acc: 0.8631 - val_binary_crossentropy: 0.4540
Epoch 9/20
 - 7s - loss: 0.0478 - acc: 0.9895 - binary_crossentropy: 0.0478 - val_loss: 0.4929 - val_acc: 0.8604 - val_binary_crossentropy: 0.4929
Epoch 10/20
 - 8s - loss: 0.0356 - acc: 0.9938 - binary_crossentropy: 0.0356 - val_loss: 0.5390 - val_acc: 0.8580 - val_binary_crossentropy: 0.5390
Epoch 11/20
 - 6s - loss: 0.0261 - acc: 0.9962 - binary_crossentropy: 0.0261 - val_loss: 0.5758 - val_acc: 0.8578 - val_binary_crossentropy: 0.5758
Epoch 12/20
 - 4s - loss: 0.0186 - acc: 0.9983 - binary_crossentropy: 0.0186 - val_loss: 0.6208 - val_acc: 0.8558 - val_binary_crossentropy: 0.6208
Epoch 13/20
 - 6s - loss: 0.0127 - acc: 0.9989 - binary_crossentropy: 0.0127 - val_loss: 0.6513 - val_acc: 0.8558 - val_binary_crossentropy: 0.6513
Epoch 14/20
 - 7s - loss: 0.0090 - acc: 0.9997 - binary_crossentropy: 0.0090 - val_loss: 0.6821 - val_acc: 0.8548 - val_binary_crossentropy: 0.6821
Epoch 15/20
 - 7s - loss: 0.0065 - acc: 1.0000 - binary_crossentropy: 0.0065 - val_loss: 0.7090 - val_acc: 0.8548 - val_binary_crossentropy: 0.7090
Epoch 16/20
 - 6s - loss: 0.0050 - acc: 1.0000 - binary_crossentropy: 0.0050 - val_loss: 0.7358 - val_acc: 0.8552 - val_binary_crossentropy: 0.7358
Epoch 17/20
 - 7s - loss: 0.0040 - acc: 1.0000 - binary_crossentropy: 0.0040 - val_loss: 0.7585 - val_acc: 0.8551 - val_binary_crossentropy: 0.7585
Epoch 18/20
 - 4s - loss: 0.0032 - acc: 1.0000 - binary_crossentropy: 0.0032 - val_loss: 0.7811 - val_acc: 0.8550 - val_binary_crossentropy: 0.7811
Epoch 19/20
 - 4s - loss: 0.0026 - acc: 1.0000 - binary_crossentropy: 0.0026 - val_loss: 0.8007 - val_acc: 0.8552 - val_binary_crossentropy: 0.8007
Epoch 20/20
 - 4s - loss: 0.0022 - acc: 1.0000 - binary_crossentropy: 0.0022 - val_loss: 0.8192 - val_acc: 0.8548 - val_binary_crossentropy: 0.8192

이번에는 기준 모델보다 hidden units이 적은 모델을 만들어 본다.

In [52]:
smaller_model = keras.Sequential([
    keras.layers.Dense(4, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(4, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

smaller_model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy', 'binary_crossentropy'])

smaller_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_6 (Dense)              (None, 4)                 40004     
_________________________________________________________________
dense_7 (Dense)              (None, 4)                 20        
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 5         
=================================================================
Total params: 40,029
Trainable params: 40,029
Non-trainable params: 0
_________________________________________________________________
In [53]:
smaller_history = smaller_model.fit(train_data,
                                    train_labels,
                                    epochs=20,
                                    batch_size=512,
                                    validation_data=(test_data, test_labels),
                                    verbose=2)
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 5s - loss: 0.6317 - acc: 0.6360 - binary_crossentropy: 0.6317 - val_loss: 0.5714 - val_acc: 0.7279 - val_binary_crossentropy: 0.5714
Epoch 2/20
 - 6s - loss: 0.5187 - acc: 0.8149 - binary_crossentropy: 0.5187 - val_loss: 0.5109 - val_acc: 0.8128 - val_binary_crossentropy: 0.5109
Epoch 3/20
 - 6s - loss: 0.4623 - acc: 0.8725 - binary_crossentropy: 0.4623 - val_loss: 0.4787 - val_acc: 0.8519 - val_binary_crossentropy: 0.4787
Epoch 4/20
 - 7s - loss: 0.4248 - acc: 0.9022 - binary_crossentropy: 0.4248 - val_loss: 0.4587 - val_acc: 0.8713 - val_binary_crossentropy: 0.4587
Epoch 5/20
 - 5s - loss: 0.3962 - acc: 0.9184 - binary_crossentropy: 0.3962 - val_loss: 0.4449 - val_acc: 0.8781 - val_binary_crossentropy: 0.4449
Epoch 6/20
 - 4s - loss: 0.3721 - acc: 0.9321 - binary_crossentropy: 0.3721 - val_loss: 0.4394 - val_acc: 0.8686 - val_binary_crossentropy: 0.4394
Epoch 7/20
 - 5s - loss: 0.3505 - acc: 0.9414 - binary_crossentropy: 0.3505 - val_loss: 0.4345 - val_acc: 0.8696 - val_binary_crossentropy: 0.4345
Epoch 8/20
 - 7s - loss: 0.3317 - acc: 0.9494 - binary_crossentropy: 0.3317 - val_loss: 0.4253 - val_acc: 0.8758 - val_binary_crossentropy: 0.4253
Epoch 9/20
 - 7s - loss: 0.3147 - acc: 0.9567 - binary_crossentropy: 0.3147 - val_loss: 0.4255 - val_acc: 0.8738 - val_binary_crossentropy: 0.4255
Epoch 10/20
 - 7s - loss: 0.2993 - acc: 0.9617 - binary_crossentropy: 0.2993 - val_loss: 0.4202 - val_acc: 0.8758 - val_binary_crossentropy: 0.4202
Epoch 11/20
 - 7s - loss: 0.2854 - acc: 0.9659 - binary_crossentropy: 0.2854 - val_loss: 0.4210 - val_acc: 0.8738 - val_binary_crossentropy: 0.4210
Epoch 12/20
 - 6s - loss: 0.2714 - acc: 0.9697 - binary_crossentropy: 0.2714 - val_loss: 0.4225 - val_acc: 0.8729 - val_binary_crossentropy: 0.4225
Epoch 13/20
 - 4s - loss: 0.2589 - acc: 0.9732 - binary_crossentropy: 0.2589 - val_loss: 0.4269 - val_acc: 0.8699 - val_binary_crossentropy: 0.4269
Epoch 14/20
 - 4s - loss: 0.2474 - acc: 0.9754 - binary_crossentropy: 0.2474 - val_loss: 0.4230 - val_acc: 0.8698 - val_binary_crossentropy: 0.4230
Epoch 15/20
 - 4s - loss: 0.2368 - acc: 0.9781 - binary_crossentropy: 0.2368 - val_loss: 0.4355 - val_acc: 0.8676 - val_binary_crossentropy: 0.4355
Epoch 16/20
 - 4s - loss: 0.2266 - acc: 0.9802 - binary_crossentropy: 0.2266 - val_loss: 0.4397 - val_acc: 0.8671 - val_binary_crossentropy: 0.4397
Epoch 17/20
 - 3s - loss: 0.2175 - acc: 0.9816 - binary_crossentropy: 0.2175 - val_loss: 0.4456 - val_acc: 0.8663 - val_binary_crossentropy: 0.4456
Epoch 18/20
 - 4s - loss: 0.2084 - acc: 0.9832 - binary_crossentropy: 0.2084 - val_loss: 0.4333 - val_acc: 0.8686 - val_binary_crossentropy: 0.4333
Epoch 19/20
 - 3s - loss: 0.2002 - acc: 0.9843 - binary_crossentropy: 0.2002 - val_loss: 0.4555 - val_acc: 0.8657 - val_binary_crossentropy: 0.4555
Epoch 20/20
 - 3s - loss: 0.1927 - acc: 0.9848 - binary_crossentropy: 0.1927 - val_loss: 0.4654 - val_acc: 0.8643 - val_binary_crossentropy: 0.4654

이번에는 기준 모델보다 hidden units이 많은 모델을 만들어 본다.

In [54]:
bigger_model = keras.models.Sequential([
    keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(512, activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

bigger_model.compile(optimizer='adam',
                     loss='binary_crossentropy',
                     metrics=['accuracy','binary_crossentropy'])

bigger_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_9 (Dense)              (None, 512)               5120512   
_________________________________________________________________
dense_10 (Dense)             (None, 512)               262656    
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 513       
=================================================================
Total params: 5,383,681
Trainable params: 5,383,681
Non-trainable params: 0
_________________________________________________________________
In [55]:
bigger_history = bigger_model.fit(train_data, train_labels,
                                  epochs=20,
                                  batch_size=512,
                                  validation_data=(test_data, test_labels),
                                  verbose=2)
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 18s - loss: 0.3478 - acc: 0.8466 - binary_crossentropy: 0.3478 - val_loss: 0.2992 - val_acc: 0.8776 - val_binary_crossentropy: 0.2992
Epoch 2/20
 - 18s - loss: 0.1441 - acc: 0.9471 - binary_crossentropy: 0.1441 - val_loss: 0.3556 - val_acc: 0.8651 - val_binary_crossentropy: 0.3556
Epoch 3/20
 - 19s - loss: 0.0532 - acc: 0.9839 - binary_crossentropy: 0.0532 - val_loss: 0.4296 - val_acc: 0.8650 - val_binary_crossentropy: 0.4296
Epoch 4/20
 - 18s - loss: 0.0100 - acc: 0.9985 - binary_crossentropy: 0.0100 - val_loss: 0.5852 - val_acc: 0.8694 - val_binary_crossentropy: 0.5852
Epoch 5/20
 - 19s - loss: 0.0011 - acc: 1.0000 - binary_crossentropy: 0.0011 - val_loss: 0.6643 - val_acc: 0.8680 - val_binary_crossentropy: 0.6643
Epoch 6/20
 - 20s - loss: 2.8284e-04 - acc: 1.0000 - binary_crossentropy: 2.8284e-04 - val_loss: 0.7065 - val_acc: 0.8680 - val_binary_crossentropy: 0.7065
Epoch 7/20
 - 20s - loss: 1.6760e-04 - acc: 1.0000 - binary_crossentropy: 1.6760e-04 - val_loss: 0.7332 - val_acc: 0.8684 - val_binary_crossentropy: 0.7332
Epoch 8/20
 - 20s - loss: 1.1922e-04 - acc: 1.0000 - binary_crossentropy: 1.1922e-04 - val_loss: 0.7526 - val_acc: 0.8683 - val_binary_crossentropy: 0.7526
Epoch 9/20
 - 20s - loss: 9.0721e-05 - acc: 1.0000 - binary_crossentropy: 9.0721e-05 - val_loss: 0.7692 - val_acc: 0.8683 - val_binary_crossentropy: 0.7692
Epoch 10/20
 - 19s - loss: 7.1760e-05 - acc: 1.0000 - binary_crossentropy: 7.1760e-05 - val_loss: 0.7820 - val_acc: 0.8682 - val_binary_crossentropy: 0.7820
Epoch 11/20
 - 23s - loss: 5.8391e-05 - acc: 1.0000 - binary_crossentropy: 5.8391e-05 - val_loss: 0.7941 - val_acc: 0.8682 - val_binary_crossentropy: 0.7941
Epoch 12/20
 - 22s - loss: 4.8347e-05 - acc: 1.0000 - binary_crossentropy: 4.8347e-05 - val_loss: 0.8046 - val_acc: 0.8684 - val_binary_crossentropy: 0.8046
Epoch 13/20
 - 21s - loss: 4.0705e-05 - acc: 1.0000 - binary_crossentropy: 4.0705e-05 - val_loss: 0.8139 - val_acc: 0.8682 - val_binary_crossentropy: 0.8139
Epoch 14/20
 - 19s - loss: 3.4762e-05 - acc: 1.0000 - binary_crossentropy: 3.4762e-05 - val_loss: 0.8229 - val_acc: 0.8681 - val_binary_crossentropy: 0.8229
Epoch 15/20
 - 18s - loss: 2.9985e-05 - acc: 1.0000 - binary_crossentropy: 2.9985e-05 - val_loss: 0.8312 - val_acc: 0.8682 - val_binary_crossentropy: 0.8312
Epoch 16/20
 - 19s - loss: 2.6114e-05 - acc: 1.0000 - binary_crossentropy: 2.6114e-05 - val_loss: 0.8379 - val_acc: 0.8681 - val_binary_crossentropy: 0.8379
Epoch 17/20
 - 19s - loss: 2.2936e-05 - acc: 1.0000 - binary_crossentropy: 2.2936e-05 - val_loss: 0.8461 - val_acc: 0.8687 - val_binary_crossentropy: 0.8461
Epoch 18/20
 - 19s - loss: 2.0306e-05 - acc: 1.0000 - binary_crossentropy: 2.0306e-05 - val_loss: 0.8517 - val_acc: 0.8683 - val_binary_crossentropy: 0.8517
Epoch 19/20
 - 19s - loss: 1.8037e-05 - acc: 1.0000 - binary_crossentropy: 1.8037e-05 - val_loss: 0.8579 - val_acc: 0.8683 - val_binary_crossentropy: 0.8579
Epoch 20/20
 - 19s - loss: 1.6142e-05 - acc: 1.0000 - binary_crossentropy: 1.6142e-05 - val_loss: 0.8643 - val_acc: 0.8686 - val_binary_crossentropy: 0.8643

이렇게 까지해서, baseline, small, bigger 총 세가지 모델을 만들고, 같은 데이터셋으로 훈련과 validation을 진행하였다.

그래프를 통해 오차율을 확인해보자.

In [56]:
def plot_history(histories, key='binary_crossentropy'):
  plt.figure(figsize=(16,10))
    
  for name, history in histories:
    val = plt.plot(history.epoch, history.history['val_'+key],
                   '--', label=name.title()+' Val')
    plt.plot(history.epoch, history.history[key], color=val[0].get_color(),
             label=name.title()+' Train')

  plt.xlabel('Epochs')
  plt.ylabel(key.replace('_',' ').title())
  plt.legend()

  plt.xlim([0,max(history.epoch)])


plot_history([('baseline', baseline_history),
              ('smaller', smaller_history),
              ('bigger', bigger_history)])

위의 그래프를 확인해보면 train데이터를 통한 오차보다 validation데이터를 통한 오차가 어떤 모델에서든 크다는 것을 볼 수 있다.

특히나 bigger와 baseline 모델에서는 epoch이 늘어날 수록 validation의 오차가 크게 증가함을 볼 수 있다.

이렇게 testdataset에만 지나치게 적합되어있는 현상을 과적합, overfitting이라고 한다.

이러한 overfitting은 어떻게 해결할 수 있을까?

Overfitting 전략 - 정규화

튜토리얼 상의 내용을 보면, weight를 학습하는데 있어서 보다 적은 값을 이용하도록 한다.

그러한 과정을 '정규화'라고 하는데, L1 정규화와 L2 정규화가 존재한다.

각각에 대해서는 문서의 본문을 그대로 참고한다.

L1 regularization, where the cost added is proportional to the absolute value of the weights coefficients (i.e. to what is called the "L1 norm" of the weights).

L2 regularization, where the cost added is proportional to the square of the value of the weights coefficients (i.e. to what is called the "L2 norm" of the weights). L2 regularization is also called weight decay in the context of neural networks. Don't let the different name confuse you: weight decay is mathematically the exact same as L2 regularization.

위와 같은 정규화를 케라스에서 사용하여 모델을 만들어본다.

In [57]:
l2_model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

l2_model.compile(optimizer='adam',
                 loss='binary_crossentropy',
                 metrics=['accuracy', 'binary_crossentropy'])

l2_model_history = l2_model.fit(train_data, train_labels,
                                epochs=20,
                                batch_size=512,
                                validation_data=(test_data, test_labels),
                                verbose=2)
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 10s - loss: 0.5400 - acc: 0.8033 - binary_crossentropy: 0.5021 - val_loss: 0.3943 - val_acc: 0.8726 - val_binary_crossentropy: 0.3546
Epoch 2/20
 - 7s - loss: 0.3134 - acc: 0.9049 - binary_crossentropy: 0.2687 - val_loss: 0.3354 - val_acc: 0.8869 - val_binary_crossentropy: 0.2869
Epoch 3/20
 - 5s - loss: 0.2578 - acc: 0.9270 - binary_crossentropy: 0.2066 - val_loss: 0.3366 - val_acc: 0.8860 - val_binary_crossentropy: 0.2833
Epoch 4/20
 - 4s - loss: 0.2316 - acc: 0.9386 - binary_crossentropy: 0.1765 - val_loss: 0.3484 - val_acc: 0.8836 - val_binary_crossentropy: 0.2921
Epoch 5/20
 - 6s - loss: 0.2178 - acc: 0.9463 - binary_crossentropy: 0.1598 - val_loss: 0.3613 - val_acc: 0.8794 - val_binary_crossentropy: 0.3022
Epoch 6/20
 - 7s - loss: 0.2043 - acc: 0.9511 - binary_crossentropy: 0.1445 - val_loss: 0.3762 - val_acc: 0.8767 - val_binary_crossentropy: 0.3158
Epoch 7/20
 - 5s - loss: 0.1969 - acc: 0.9543 - binary_crossentropy: 0.1354 - val_loss: 0.3911 - val_acc: 0.8725 - val_binary_crossentropy: 0.3287
Epoch 8/20
 - 4s - loss: 0.1882 - acc: 0.9582 - binary_crossentropy: 0.1251 - val_loss: 0.4013 - val_acc: 0.8716 - val_binary_crossentropy: 0.3379
Epoch 9/20
 - 5s - loss: 0.1819 - acc: 0.9599 - binary_crossentropy: 0.1178 - val_loss: 0.4190 - val_acc: 0.8702 - val_binary_crossentropy: 0.3543
Epoch 10/20
 - 7s - loss: 0.1795 - acc: 0.9620 - binary_crossentropy: 0.1141 - val_loss: 0.4350 - val_acc: 0.8670 - val_binary_crossentropy: 0.3691
Epoch 11/20
 - 7s - loss: 0.1736 - acc: 0.9633 - binary_crossentropy: 0.1071 - val_loss: 0.4417 - val_acc: 0.8660 - val_binary_crossentropy: 0.3746
Epoch 12/20
 - 6s - loss: 0.1699 - acc: 0.9648 - binary_crossentropy: 0.1028 - val_loss: 0.4632 - val_acc: 0.8618 - val_binary_crossentropy: 0.3955
Epoch 13/20
 - 4s - loss: 0.1688 - acc: 0.9660 - binary_crossentropy: 0.1002 - val_loss: 0.4665 - val_acc: 0.8632 - val_binary_crossentropy: 0.3975
Epoch 14/20
 - 4s - loss: 0.1588 - acc: 0.9710 - binary_crossentropy: 0.0899 - val_loss: 0.4748 - val_acc: 0.8610 - val_binary_crossentropy: 0.4062
Epoch 15/20
 - 3s - loss: 0.1523 - acc: 0.9744 - binary_crossentropy: 0.0838 - val_loss: 0.4883 - val_acc: 0.8620 - val_binary_crossentropy: 0.4196
Epoch 16/20
 - 4s - loss: 0.1498 - acc: 0.9744 - binary_crossentropy: 0.0809 - val_loss: 0.5009 - val_acc: 0.8597 - val_binary_crossentropy: 0.4318
Epoch 17/20
 - 5s - loss: 0.1474 - acc: 0.9760 - binary_crossentropy: 0.0782 - val_loss: 0.5079 - val_acc: 0.8590 - val_binary_crossentropy: 0.4383
Epoch 18/20
 - 4s - loss: 0.1455 - acc: 0.9756 - binary_crossentropy: 0.0756 - val_loss: 0.5240 - val_acc: 0.8574 - val_binary_crossentropy: 0.4537
Epoch 19/20
 - 4s - loss: 0.1423 - acc: 0.9772 - binary_crossentropy: 0.0719 - val_loss: 0.5285 - val_acc: 0.8601 - val_binary_crossentropy: 0.4580
Epoch 20/20
 - 3s - loss: 0.1401 - acc: 0.9790 - binary_crossentropy: 0.0693 - val_loss: 0.5415 - val_acc: 0.8562 - val_binary_crossentropy: 0.4702

L2 정규화를 사용하는 l2_model을 구성하였다.

앞의 2개의 레이어에 정규화를 진행시켰는데, kernel_regularizer에 사용할 정규화를 케라스 라이브러리를 통해 입력해준다. 이때 뒤에 소괄호를 통해 넣어주는 값은 정규화의 세기를 의미한다.

l2_model을 그래프로 살펴보고 기준 모델과 비교해본다.

In [58]:
plot_history([('baseline', baseline_history),
              ('l2', l2_model_history)])

Overfitting 전략 - Drop out

이번에 알아볼 Overfitting 전략은 Drop out 방법이다.

이는 쉽게 이야기해서, 학습할 때 모든 노드들이 일하는 것이 아니고, 랜덤하게 특정 노드들은 학습을 하지 않도록 하는 것이다.

이번에도 바로 드랍아웃을 적용한 모델을 만들어 본다.

In [59]:
dpt_model = keras.models.Sequential([
    keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(16, activation=tf.nn.relu),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

dpt_model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy','binary_crossentropy'])

dpt_model_history = dpt_model.fit(train_data, train_labels,
                                  epochs=20,
                                  batch_size=512,
                                  validation_data=(test_data, test_labels),
                                  verbose=2)
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 9s - loss: 0.6336 - acc: 0.6349 - binary_crossentropy: 0.6336 - val_loss: 0.5118 - val_acc: 0.8349 - val_binary_crossentropy: 0.5118
Epoch 2/20
 - 7s - loss: 0.4848 - acc: 0.7913 - binary_crossentropy: 0.4848 - val_loss: 0.3582 - val_acc: 0.8785 - val_binary_crossentropy: 0.3582
Epoch 3/20
 - 5s - loss: 0.3747 - acc: 0.8632 - binary_crossentropy: 0.3747 - val_loss: 0.3021 - val_acc: 0.8882 - val_binary_crossentropy: 0.3021
Epoch 4/20
 - 4s - loss: 0.2978 - acc: 0.8969 - binary_crossentropy: 0.2978 - val_loss: 0.2791 - val_acc: 0.8865 - val_binary_crossentropy: 0.2791
Epoch 5/20
 - 5s - loss: 0.2509 - acc: 0.9136 - binary_crossentropy: 0.2509 - val_loss: 0.2811 - val_acc: 0.8858 - val_binary_crossentropy: 0.2811
Epoch 6/20
 - 7s - loss: 0.2168 - acc: 0.9277 - binary_crossentropy: 0.2168 - val_loss: 0.2903 - val_acc: 0.8854 - val_binary_crossentropy: 0.2903
Epoch 7/20
 - 8s - loss: 0.1900 - acc: 0.9368 - binary_crossentropy: 0.1900 - val_loss: 0.3101 - val_acc: 0.8832 - val_binary_crossentropy: 0.3101
Epoch 8/20
 - 8s - loss: 0.1656 - acc: 0.9456 - binary_crossentropy: 0.1656 - val_loss: 0.3192 - val_acc: 0.8840 - val_binary_crossentropy: 0.3192
Epoch 9/20
 - 8s - loss: 0.1520 - acc: 0.9488 - binary_crossentropy: 0.1520 - val_loss: 0.3468 - val_acc: 0.8814 - val_binary_crossentropy: 0.3468
Epoch 10/20
 - 7s - loss: 0.1376 - acc: 0.9524 - binary_crossentropy: 0.1376 - val_loss: 0.3632 - val_acc: 0.8808 - val_binary_crossentropy: 0.3632
Epoch 11/20
 - 4s - loss: 0.1230 - acc: 0.9580 - binary_crossentropy: 0.1230 - val_loss: 0.3925 - val_acc: 0.8796 - val_binary_crossentropy: 0.3925
Epoch 12/20
 - 5s - loss: 0.1120 - acc: 0.9611 - binary_crossentropy: 0.1120 - val_loss: 0.4139 - val_acc: 0.8791 - val_binary_crossentropy: 0.4139
Epoch 13/20
 - 6s - loss: 0.1025 - acc: 0.9632 - binary_crossentropy: 0.1025 - val_loss: 0.4263 - val_acc: 0.8769 - val_binary_crossentropy: 0.4263
Epoch 14/20
 - 4s - loss: 0.0960 - acc: 0.9658 - binary_crossentropy: 0.0960 - val_loss: 0.4587 - val_acc: 0.8750 - val_binary_crossentropy: 0.4587
Epoch 15/20
 - 4s - loss: 0.0876 - acc: 0.9680 - binary_crossentropy: 0.0876 - val_loss: 0.4755 - val_acc: 0.8755 - val_binary_crossentropy: 0.4755
Epoch 16/20
 - 5s - loss: 0.0842 - acc: 0.9687 - binary_crossentropy: 0.0842 - val_loss: 0.4955 - val_acc: 0.8747 - val_binary_crossentropy: 0.4955
Epoch 17/20
 - 4s - loss: 0.0808 - acc: 0.9702 - binary_crossentropy: 0.0808 - val_loss: 0.5094 - val_acc: 0.8769 - val_binary_crossentropy: 0.5094
Epoch 18/20
 - 5s - loss: 0.0787 - acc: 0.9700 - binary_crossentropy: 0.0787 - val_loss: 0.5444 - val_acc: 0.8757 - val_binary_crossentropy: 0.5444
Epoch 19/20
 - 5s - loss: 0.0744 - acc: 0.9712 - binary_crossentropy: 0.0744 - val_loss: 0.5404 - val_acc: 0.8730 - val_binary_crossentropy: 0.5404
Epoch 20/20
 - 7s - loss: 0.0739 - acc: 0.9715 - binary_crossentropy: 0.0739 - val_loss: 0.5570 - val_acc: 0.8724 - val_binary_crossentropy: 0.5570

drop out을 적용한, dpt_model을 구성하였다.

각각의 레이어 다음에 0.5만큼의 drop out을 하도록 설정하였다.

dpt_model 또한 기준 모델과 비교해본다.

In [60]:
plot_history([('baseline', baseline_history),
              ('dropout', dpt_model_history)])

이렇게, 우리는 overfitting을 해결하기 위해 정규화와, 드랍아웃 두가지를 살펴보았다.

이 외에도 아래와 같은 방법을로 overfitting을 해소할 수 있다.

  • Get more training data.
  • Reduce the capacity of the network.
  • Add weight regularization.
  • Add dropout.

tensorflow 튜토리얼의 overfitting and underfitting의 문서는 위의 내용까지이다.

추가적으로 L2정규화와, 드랍아웃을 함께 적용한 모델을 만들고 테스트 해보았다.

In [61]:
l2_dpt_model = keras.models.Sequential([
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu, input_shape=(NUM_WORDS,)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),
                       activation=tf.nn.relu),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

l2_dpt_model.compile(optimizer='adam',
                 loss='binary_crossentropy',
                 metrics=['accuracy', 'binary_crossentropy'])

l2_dpt_model_history = l2_dpt_model.fit(train_data, train_labels,
                                epochs=20,
                                batch_size=512,
                                validation_data=(test_data, test_labels),
                                verbose=2)
Train on 25000 samples, validate on 25000 samples
Epoch 1/20
 - 8s - loss: 0.6572 - acc: 0.6607 - binary_crossentropy: 0.6204 - val_loss: 0.5217 - val_acc: 0.8572 - val_binary_crossentropy: 0.4884
Epoch 2/20
 - 7s - loss: 0.4884 - acc: 0.8236 - binary_crossentropy: 0.4540 - val_loss: 0.3850 - val_acc: 0.8816 - val_binary_crossentropy: 0.3488
Epoch 3/20
 - 8s - loss: 0.3984 - acc: 0.8780 - binary_crossentropy: 0.3599 - val_loss: 0.3424 - val_acc: 0.8878 - val_binary_crossentropy: 0.3015
Epoch 4/20
 - 7s - loss: 0.3490 - acc: 0.8994 - binary_crossentropy: 0.3059 - val_loss: 0.3301 - val_acc: 0.8874 - val_binary_crossentropy: 0.2848
Epoch 5/20
 - 7s - loss: 0.3162 - acc: 0.9143 - binary_crossentropy: 0.2684 - val_loss: 0.3309 - val_acc: 0.8865 - val_binary_crossentropy: 0.2806
Epoch 6/20
 - 7s - loss: 0.2951 - acc: 0.9214 - binary_crossentropy: 0.2429 - val_loss: 0.3382 - val_acc: 0.8872 - val_binary_crossentropy: 0.2840
Epoch 7/20
 - 7s - loss: 0.2763 - acc: 0.9281 - binary_crossentropy: 0.2201 - val_loss: 0.3501 - val_acc: 0.8837 - val_binary_crossentropy: 0.2922
Epoch 8/20
 - 7s - loss: 0.2663 - acc: 0.9337 - binary_crossentropy: 0.2065 - val_loss: 0.3682 - val_acc: 0.8826 - val_binary_crossentropy: 0.3066
Epoch 9/20
 - 7s - loss: 0.2606 - acc: 0.9355 - binary_crossentropy: 0.1975 - val_loss: 0.3688 - val_acc: 0.8818 - val_binary_crossentropy: 0.3043
Epoch 10/20
 - 7s - loss: 0.2468 - acc: 0.9412 - binary_crossentropy: 0.1811 - val_loss: 0.3903 - val_acc: 0.8787 - val_binary_crossentropy: 0.3231
Epoch 11/20
 - 8s - loss: 0.2433 - acc: 0.9438 - binary_crossentropy: 0.1746 - val_loss: 0.4041 - val_acc: 0.8788 - val_binary_crossentropy: 0.3340
Epoch 12/20
 - 8s - loss: 0.2386 - acc: 0.9456 - binary_crossentropy: 0.1674 - val_loss: 0.4017 - val_acc: 0.8767 - val_binary_crossentropy: 0.3291
Epoch 13/20
 - 8s - loss: 0.2349 - acc: 0.9481 - binary_crossentropy: 0.1615 - val_loss: 0.4302 - val_acc: 0.8774 - val_binary_crossentropy: 0.3557
Epoch 14/20
 - 7s - loss: 0.2293 - acc: 0.9500 - binary_crossentropy: 0.1538 - val_loss: 0.4414 - val_acc: 0.8772 - val_binary_crossentropy: 0.3648
Epoch 15/20
 - 8s - loss: 0.2261 - acc: 0.9516 - binary_crossentropy: 0.1487 - val_loss: 0.4367 - val_acc: 0.8774 - val_binary_crossentropy: 0.3582
Epoch 16/20
 - 7s - loss: 0.2263 - acc: 0.9516 - binary_crossentropy: 0.1471 - val_loss: 0.4329 - val_acc: 0.8755 - val_binary_crossentropy: 0.3529
Epoch 17/20
 - 7s - loss: 0.2254 - acc: 0.9534 - binary_crossentropy: 0.1448 - val_loss: 0.4579 - val_acc: 0.8750 - val_binary_crossentropy: 0.3768
Epoch 18/20
 - 8s - loss: 0.2202 - acc: 0.9548 - binary_crossentropy: 0.1386 - val_loss: 0.4616 - val_acc: 0.8748 - val_binary_crossentropy: 0.3797
Epoch 19/20
 - 7s - loss: 0.2215 - acc: 0.9546 - binary_crossentropy: 0.1393 - val_loss: 0.4714 - val_acc: 0.8759 - val_binary_crossentropy: 0.3889
Epoch 20/20
 - 8s - loss: 0.2199 - acc: 0.9564 - binary_crossentropy: 0.1370 - val_loss: 0.4605 - val_acc: 0.8749 - val_binary_crossentropy: 0.3772
In [62]:
plot_history([('baseline', baseline_history),
              ('L2 with dropout', l2_dpt_model_history)])

위와 같이 보다 좋은 결과를 확인할 수 있었다.


블로그 이미지

Tigercow.Door

Web Programming / Back-end / Database / AI / Algorithm / DeepLearning / etc


안녕하세요. 문범우입니다.


이번 포스팅에서는 TensorFlow tutorial의 두번째인, Text classification에 대해서 진행해보았습니다.


In [1]:
# TensorFlow and tf.keras
# 텐서플로우와 keras를 import한다. 이떄 tensorflow는 tf라는 별칭으로 사용할 것임.
import tensorflow as tf
from tensorflow import keras

# Helper libraries
# numpy와 matplotlib을 사용한다.
import numpy as np
import matplotlib.pyplot as plt
# jupyter notebook에서 matplotlib을 사용하기 위한 매직커맨드
%matplotlib inline

print("사용되는 tensorflow의 버전:",tf.__version__)
사용되는 tensorflow의 버전: 1.9.0

ㄱ. 데이터준비

이번 text classification 실습을 위한 데이터는 keras dataset에 있는 imdb를 사용한다.

(해당 데이터는 영화에 대한 review 내용이다.)

아래에서 확인해볼 수 있겠지만 train과 test 데이터 셋은 각각 25,000개 이며 데이터는 review자체로 구성되어 있다.

또한 labels는 0또는 1값으로 해당 review가 긍정적인 것인지 부정적인 것인지를 나타낸다.

따라서 우리는 test데이터를 이용하여 해당 Review가 영화에 대해 긍정적인 것인지, 부정적인 것인지를 예측해본다.

데이터의 자세한 내용은 아래에서 확인해본다.

In [28]:
imdb = keras.datasets.imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
In [29]:
# training data 확인
print("Training entries: {}, labels: {}".format(len(train_data), len(train_labels)))
Training entries: 25000, labels: 25000

training data는 위와 같다.

하지만, 실제로 데이터를 하나 찍어보면 string이 아닌 integer 값이 리스트로 들어있다.

In [30]:
print(train_data[0])
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

이렇게 되어 있는 이유는, 추가적인 dictionary에 각 숫자와 단어가 매칭되어 있기 때문이다.

또한 아래와 같이 각각의 데이터의 길이도 다른 것을 확인할 수 있다.

In [31]:
len(train_data[0]), len(train_data[1])
Out[31]:
(218, 189)

하지만 실제로 우리가 ML모델에 넣어줄때, 입력의 길이는 모두 같아야 한다.

따라서 우리는 추후에 입력의 길이가 다른 것에 대한 컨트롤을 진행한다.

먼저 데이터를 보다 자세히 확인해보기 위해 각 데이터의 숫자를 단어로 치환해본다.

In [51]:
# 숫자로 된 값을 단어로 바꾸기 위한 dictionary를 가져온다
word_index = imdb.get_word_index()
#word_index #블로그 길이때문에 주석처리함
In [52]:
#word_index.items() #블로그 길이때문에 주석처리함.

word_index라는 dictionary는 단어가 key, 숫자가 value로 되어있다.

추가적으로, pad, start, unknown, unused 값을 나타내기 위해 각 value에 3을 더하고 비어있게 되는 0~3에 각각을 할당한다.

In [34]:
# The first indices are reserved
word_index = {k:(v+3) for k,v in word_index.items()} 
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

또한 실제로 우리가 필요한 dictionary는 숫자가 key이고, 단어가 value인 dictionary이기 때문에,

reverse_word_index 라는 dictionary를 구성하고 숫자로 이루어진 입력데이터를 단어로 치환해주며 문장으로 출력하는는 decode_review함수를 만든다.

In [35]:
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])
In [36]:
# 하나의 입력데이터를 문장으로 확인해보자
decode_review(train_data[0])
Out[36]:
"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"

앞에서 추가해주었던 start, unk 등이 추가되어 보여지는 것을 확인할 수 있다.

ㄴ. 데이터 전처리

실제로 우리의 데이터를 ML모델에 입력하기 위해 데이터 전처리를 진행한다.

먼저 위에서 언급했던 각 데이터의 길이가 상이한 것을 처리한다.

keras에서 제공하는 preprocessing 함수를 이용하여 모든 데이터를 최대길이로 늘려주면서 빈공간에는 위에서 dictionary에 추가적으로 넣어주었던 pad값을 이용한다.

In [37]:
train_data = keras.preprocessing.sequence.pad_sequences(train_data,
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

test_data = keras.preprocessing.sequence.pad_sequences(test_data,
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)

train데이터와 test데이터를 같이 작업하였고, 이를 통한 입력데이터를 확인해본다.

In [38]:
print("길이가 동일한가? => 0번째 데이터 길이:",len(train_data[0]),"1번째 데이터 길이",len(train_data[1]))
tmp_len_check = 0
tmp_len = 256
for data in train_data:
    if(tmp_len == len(data)): tmp_len_check += 1
if(tmp_len_check == len(train_data)):
    print("모든 데이터의 길이가 256으로 동일합니다!")
else:
    print("데이터의 길이가 동일하지 않습니다!")
print("데이터 형태는?\n",train_data[0])
길이가 동일한가? => 0번째 데이터 길이: 256 1번째 데이터 길이 256
모든 데이터의 길이가 256으로 동일합니다!
데이터 형태는?
 [   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941
    4  173   36  256    5   25  100   43  838  112   50  670    2    9
   35  480  284    5  150    4  172  112  167    2  336  385   39    4
  172 4536 1111   17  546   38   13  447    4  192   50   16    6  147
 2025   19   14   22    4 1920 4613  469    4   22   71   87   12   16
   43  530   38   76   15   13 1247    4   22   17  515   17   12   16
  626   18    2    5   62  386   12    8  316    8  106    5    4 2223
 5244   16  480   66 3785   33    4  130   12   16   38  619    5   25
  124   51   36  135   48   25 1415   33    6   22   12  215   28   77
   52    5   14  407   16   82    2    8    4  107  117 5952   15  256
    4    2    7 3766    5  723   36   71   43  530  476   26  400  317
   46    7    4    2 1029   13  104   88    4  381   15  297   98   32
 2071   56   26  141    6  194 7486   18    4  226   22   21  134  476
   26  480    5  144   30 5535   18   51   36   28  224   92   25  104
    4  226   65   16   38 1334   88   12   16  283    5   16 4472  113
  103   32   15   16 5345   19  178   32    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0]

위와 같이 데이터 길이가 서로 동일한 것을 확인할 수 있고, 실제로 0번째 데이터를 확인해보았을때 맨 뒤에 0값, 즉 pad값이 포함된 것을 확인할 수 있다.

ㄷ. 모델 구성하기

이제 text classification을 수행할 ML모델을 만들어보자.

먼저 vocab_size 는 영화리뷰에 사용되는 단어의 개수이다.

실제로 위에서의 단어와 숫자를 매칭하는 dictionary의 사이즈는 보다 크지만, 해당 데이터에서는 10000개의 단어 이내에로 리뷰가 작성되었다.

각 레이어에 대한 설명은 다음과 같다.

  1. embedding 해당 레이어는 숫자로 인코딩 되어있는 각 단어를 사용하며 각 단어 인덱스에 대한 벡터를 찾는다.

이러한 벡터는 추후 model이 학습하는데 사용된다.

  1. GlobalAveragePooling1D 해당 레이어에서는 각 예시에 대해 sequence 차원을 평균하여 고정된 길이의 벡터를 출력한다.

이를 통해 가변적인 길이의 입력을 간단하게 처리할 수 있다.

  1. Dense_1, Dense_2 첫번째 Dense 레이어를 통해서, 고정길이로 출력된 vector 값을 통해 16개의 hidden unit을 가진 fully-connected layer를 통과시킨다.

이후 두번째 Dense 레이어는 단일 출력 노드를 가짐고 시그모이드 활성화 함수를 사용함으로써 결과에 대해 0 ~ 1 사이의 값을 가지도록 한다.

In [40]:
# input shape is the vocabulary count used for the movie reviews (10,000 words)
vocab_size = 10000

model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 16)          160000    
_________________________________________________________________
global_average_pooling1d_1 ( (None, 16)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 17        
=================================================================
Total params: 160,289
Trainable params: 160,289
Non-trainable params: 0
_________________________________________________________________

모델 구성에 대한 마지막으로 loss function과 optimizer를 설정한다.

In [41]:
model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='binary_crossentropy',
              metrics=['accuracy'])

ㄹ. 모델 훈련하기

모델을 훈련하기에 앞서 우리는 10,000개의 데이터를 따로 떼어 validation set을 만든다.

이렇게 하는 이유는 모델이 새롭게 접하는 데이터에 대한 accuracy와 loss 등을 확인해보기 위해서이다.

In [43]:
x_val = train_data[:10000]
partial_x_train = train_data[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=40,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)
Train on 15000 samples, validate on 10000 samples
Epoch 1/40
15000/15000 [==============================] - 1s 55us/step - loss: 0.7099 - acc: 0.5045 - val_loss: 0.6942 - val_acc: 0.5106
Epoch 2/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.6920 - acc: 0.5147 - val_loss: 0.6911 - val_acc: 0.5132
Epoch 3/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.6901 - acc: 0.5292 - val_loss: 0.6897 - val_acc: 0.5391
Epoch 4/40
15000/15000 [==============================] - 0s 31us/step - loss: 0.6882 - acc: 0.5435 - val_loss: 0.6878 - val_acc: 0.5510
Epoch 5/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.6859 - acc: 0.5797 - val_loss: 0.6856 - val_acc: 0.5742
Epoch 6/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.6832 - acc: 0.6010 - val_loss: 0.6827 - val_acc: 0.6093
Epoch 7/40
15000/15000 [==============================] - 0s 29us/step - loss: 0.6797 - acc: 0.6424 - val_loss: 0.6793 - val_acc: 0.6413
Epoch 8/40
15000/15000 [==============================] - 0s 30us/step - loss: 0.6750 - acc: 0.6792 - val_loss: 0.6742 - val_acc: 0.6842
Epoch 9/40
15000/15000 [==============================] - 0s 30us/step - loss: 0.6651 - acc: 0.6916 - val_loss: 0.6613 - val_acc: 0.7029
Epoch 10/40
15000/15000 [==============================] - 0s 31us/step - loss: 0.6505 - acc: 0.7511 - val_loss: 0.6468 - val_acc: 0.7495
Epoch 11/40
15000/15000 [==============================] - 0s 30us/step - loss: 0.6331 - acc: 0.7627 - val_loss: 0.6294 - val_acc: 0.7663
Epoch 12/40
15000/15000 [==============================] - 1s 34us/step - loss: 0.6123 - acc: 0.7869 - val_loss: 0.6095 - val_acc: 0.7746
Epoch 13/40
15000/15000 [==============================] - 0s 33us/step - loss: 0.5888 - acc: 0.7942 - val_loss: 0.5889 - val_acc: 0.7767
Epoch 14/40
15000/15000 [==============================] - 0s 33us/step - loss: 0.5639 - acc: 0.8051 - val_loss: 0.5646 - val_acc: 0.7939
Epoch 15/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.5371 - acc: 0.8123 - val_loss: 0.5396 - val_acc: 0.8027
Epoch 16/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.5103 - acc: 0.8221 - val_loss: 0.5156 - val_acc: 0.8051
Epoch 17/40
15000/15000 [==============================] - 0s 32us/step - loss: 0.4830 - acc: 0.8359 - val_loss: 0.4920 - val_acc: 0.8205
Epoch 18/40
15000/15000 [==============================] - 0s 30us/step - loss: 0.4569 - acc: 0.8460 - val_loss: 0.4691 - val_acc: 0.8285
Epoch 19/40
15000/15000 [==============================] - 0s 33us/step - loss: 0.4321 - acc: 0.8541 - val_loss: 0.4478 - val_acc: 0.8363
Epoch 20/40
15000/15000 [==============================] - 1s 36us/step - loss: 0.4092 - acc: 0.8621 - val_loss: 0.4285 - val_acc: 0.8411
Epoch 21/40
15000/15000 [==============================] - 1s 34us/step - loss: 0.3876 - acc: 0.8697 - val_loss: 0.4108 - val_acc: 0.8463
Epoch 22/40
15000/15000 [==============================] - 1s 40us/step - loss: 0.3683 - acc: 0.8753 - val_loss: 0.3951 - val_acc: 0.8515
Epoch 23/40
15000/15000 [==============================] - 0s 30us/step - loss: 0.3511 - acc: 0.8818 - val_loss: 0.3817 - val_acc: 0.8559
Epoch 24/40
15000/15000 [==============================] - 0s 28us/step - loss: 0.3351 - acc: 0.8862 - val_loss: 0.3695 - val_acc: 0.8596
Epoch 25/40
15000/15000 [==============================] - 0s 29us/step - loss: 0.3211 - acc: 0.8901 - val_loss: 0.3590 - val_acc: 0.8641
Epoch 26/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.3084 - acc: 0.8931 - val_loss: 0.3500 - val_acc: 0.8669
Epoch 27/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2972 - acc: 0.8952 - val_loss: 0.3420 - val_acc: 0.8691
Epoch 28/40
15000/15000 [==============================] - 0s 29us/step - loss: 0.2864 - acc: 0.9001 - val_loss: 0.3349 - val_acc: 0.8715
Epoch 29/40
15000/15000 [==============================] - 0s 28us/step - loss: 0.2768 - acc: 0.9029 - val_loss: 0.3291 - val_acc: 0.8725
Epoch 30/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2685 - acc: 0.9053 - val_loss: 0.3237 - val_acc: 0.8754
Epoch 31/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2598 - acc: 0.9087 - val_loss: 0.3191 - val_acc: 0.8757
Epoch 32/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2526 - acc: 0.9093 - val_loss: 0.3150 - val_acc: 0.8767
Epoch 33/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2451 - acc: 0.9123 - val_loss: 0.3114 - val_acc: 0.8763
Epoch 34/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2382 - acc: 0.9151 - val_loss: 0.3080 - val_acc: 0.8771
Epoch 35/40
15000/15000 [==============================] - 0s 30us/step - loss: 0.2322 - acc: 0.9164 - val_loss: 0.3051 - val_acc: 0.8787
Epoch 36/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2257 - acc: 0.9189 - val_loss: 0.3026 - val_acc: 0.8794
Epoch 37/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2200 - acc: 0.9213 - val_loss: 0.3007 - val_acc: 0.8799
Epoch 38/40
15000/15000 [==============================] - 0s 26us/step - loss: 0.2145 - acc: 0.9227 - val_loss: 0.2980 - val_acc: 0.8813
Epoch 39/40
15000/15000 [==============================] - 0s 29us/step - loss: 0.2089 - acc: 0.9254 - val_loss: 0.2962 - val_acc: 0.8815
Epoch 40/40
15000/15000 [==============================] - 0s 27us/step - loss: 0.2038 - acc: 0.9268 - val_loss: 0.2944 - val_acc: 0.8822
In [44]:
results = model.evaluate(test_data, test_labels)

print(results)
25000/25000 [==============================] - 1s 21us/step
[0.3078416971683502, 0.87396]

위를 통해 우리의 모델은 test데이터 기반, 약 87%의 정확도를 가짐을 볼 수 있다.

실제로 더 진보된 모델이라고 하기 위해서는 약 95%이상의 정확도를 필요로 한다.

일단 이정도로 하고, 위에서 결과로 확인한 정확도와 오차 등을 그래프로 확인해보고 마무리한다.

In [48]:
history_dict = history.history
history_dict.keys()
Out[48]:
dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])
In [49]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

# "bo" is for "blue dot"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b is for "solid blue line"
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()
In [50]:
plt.clf()   # clear figure
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()


블로그 이미지

Tigercow.Door

Web Programming / Back-end / Database / AI / Algorithm / DeepLearning / etc