기본기 다지기/CNN부터 Attention까지 구현

[기본이론] Attention 구현(2/3)

syveany 2024. 9. 27. 16:57

Attention 구현(2/3)

참고문헌: 책 『밑바닥부터 시작하는 딥러닝2』 Chapter8. 어텐션

~ 목차 ~
2. Attention을 갖춘 seq2seq 구현
2.1 AttentionEncoder 구현
2.2 AttentionDecoder 구현
2.3 AttentionSeq2seq 구현
3. Attention 평가
3.1 Attention을 갖춘 seq2seq 학습
3.2 Attention 시각화

저번 포스팅에 이어서 Attention과 학습과정을 구현하고, 학습 결과를 시각화해서 Attention이 어떤 작업을 하고 있는지 알아보고자 한다.

2. Attention을 갖춘 seq2seq 구현

2.1 AttentionEncoder 구현

앞에서 나온 Encoder는 LSTM 계층의 마지막 은닉상태만 반환했지만 AttentionEncoder는 모든 은닉상태를 반환함

class Encoder:
  def __init__(self, vocab_size, wordvec_size, hidden_size):
    V, D, H = vocab_size, wordvec_size, hidden_size
    rn = np.random.randn

    embed_W = (rn(V, D) / 100).astype('f')
    lstm_Wx = (rn(D, 4*H) / np.sqrt(D)).astype('f')
    lstm_Wh = (rn(H, 4*H) / np.sqrt(H)).astype('f')
    lstm_b = np.zeros(4*H).astype('f')

    self.embed = TimeEmbedding(embed_W)
    self.lstm = TimeLSTM(lstm_Wx, lstm_Wh, lstm_b, stateful=False)

    self.params = self.embed.params + self.lstm.params
    self.grads = self.embed.grads + self.lstm.grads
    self.hs = None

  def forward(self, xs):
    xs = self.embed.forward(xs)
    hs = self.lstm.forward(xs)
    self.hs = hs
    # 마지막 은닉상태만 반환함
    return hs[:,-1,:]

  def backward(self, dh):
    dhs = np.zeros_like(self.hs)
    dhs[:,-1,:] = dh

    dout = self.lstm.backward(dhs)
    dout = self.embed.backward(dout)
    return dout

AttentionEncoder 구현

class AttentionEncoder(Encoder):
  def forward(self, xs):
    xs = self.embed.forward(xs)
    hs = self.lstm.forward(xs)
    # AttentionEncoder는 모든 은닉상태를 반환함
    return hs

  def backward(self, dhs):
    dout = self.lstm.backward(dhs)
    dout = self.embed.backward(dout)
    return dout

2.2 AttentionDecoder 구현

Decoder는 아래와 같이 생김. 그냥 Decoder와 비슷하지만, 중간에 TimeAttention의 출력과 LSTM계층의 출력을 연결해서 TimeAffine 계층에 입력하는 부분이 추가됨(빨간 화살표)

AttentionDecoder 구현

class AttentionDecoder:
  def __init(self, vocab_size, wordvec_size, hidden_size):
    V, D, H = vocab_size, wordvec_size, hidden_size
    rn = np.random.randn
    embed_W = (rn(V,D) / 100).astype('f')
    lstm_Wx = (rn(D, 4*H) / np.sqrt(D)).astype('f')
    lstm_Wh = (rn(H, 4*H) / np.sqrt(H)).astype('f')
    lstm_b = np.zeros(4*H).astype('f')
    affine_W = (rn(2*H, V) / np.sqrt(2*H)).astype('f')
    affine_b = np.zeros(V).astype('f')

    self.embed = TimeEmbedding(embed_W)
    self.lstm = TimeLSTM(lstm_Wx, lstm_Wh, lstm_b, stateful=True)
    # TimeAttention계층 추가
    self.attention = TimeAttention()
    self.affine = TimeAffine(affine_W, affine_b)
    layers = [self.embed, self.lstm, self.attention, self.affine]

    self.params, self.grads = [], []
    for layer in layers:
      self.params += layer.params
      self.grads += layer.grads

  def forward(self, xs, enc_hs):
    h = enc_hs[:,-1]
    self.lstm.set_state(h)

    out = self.embed.forward(xs)
    dec_hs = self.lstm.forward(out)
    # TimeAttention 통과시켜서 맥락벡터 c 구함
    c = self.attention.forward(enc_hs, dec_hs)
    # TimeAttention의 출력과 LSTM계층의 출력을 연결
    out = np.concatenate((c, dec_hs), axis=2)
    score = self.affine.forward(out)
    return score

  def backward(self, score):
    # 별다를 건 없음
  def generate(self,enc_hs, start_id, sample_size):
    # 별다를 건 없음

2.3 AttentionSeq2seq 구현

Encoder 대신 AttentionEncoder를, Decoder 대신 AttentionDecoder를 사용함
앞 장에서 구현한 Seq2seq를 상속해서 초기화 메서드만 수정하면 됨

class AttentionSeq2seq(Seq2seq):
  def __init__(self, vocab_size, wordvec_size, hidden_size):
    args = vocab_size, wordvec_size, hidden_size
    # AttentionEncoder 사용
    self.encoder = AttentionEncoder(*args)
    # AttentionDecoder 사용
    self.decoder = AttentionDecoder(*args)
    self.softmax = TimeSoftmaxWithLoss()

    self.params = self.encoder.params + self.decoder.params
    self.grads = self.encoder.grads + self.decoder.grads

3. Attention 평가

3.1 Attention을 갖춘 seq2seq 학습

학습 코드는 아래와 같음
다양한 라이브러리를 임포트하고 있지만 코드 구현을 어떻게 하는지 보는 게 목표이기 때문에 자세한 경로설정은 생략하겠음

import numpy as np
from dataset import sequence
from common.optimizer import Adam
from common.trainer import Trainer
from common.util import eval_seq2seq
from attention_seq2seq import AttentionSeq2seq
from ch07.seq2seq import Seq2seq
from ch07.peeky_seq2seq import PeekySeq2seq

# (data load도 그냥 그런갑다하고 넘김)
(x_train, t_train), (x_test, t_test) = sequence.load_data('date.txt')
# date.txt에 등장하는 각 문자를 고유한 숫자 ID로 변환하는 테이블을 만듦
char_to_id, id_to_char = sequence.get_vocab()

# peeky seq2seq를 위한 입력문장 반전
x_train, x_test = x_train[:,::-1], x_test[:,::-1]

# 고유한 문자의 개수
vocab_size = len(char_to_id)

wordvec_size = 16
hidden_size = 256
batch_size = 128
max_epoch = 10
max_grad = 5.0

model = AttentionSeq2seq(vocab_size, wordvec_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)

acc_list = []
for epoch in range(max_epoch):
  trainer.fit(x_train, t_train, max_epoch=1, batch_size=batch_size, max_grad=max_grad)
  correct_num = 0
  for i in range(len(x_test)):
    question, correct = x_test[[i]], t_test[[i]]
    # 처음 10개 샘플에 대해서만 평가 과정을 출력한다는 뜻
    verbose = i < 10
    correct_num += eval_seq2seq(model, question, correct, id_to_char, verbose, is_reverse=True)
    acc = float(correct_num) / len(x_test)
    acc_list.append(acc)
    print('val acc %.3f%%' % (acc * 100))

# 모델의 파라미터 저장. 저장하면 나중에 파라미터를 불러와서 예측을 하거나 추가적인 학습을 진행할 수 있음
model.save_params()

이런 Attention을 갖춘 seq2seq를 사용하면 앞에서 나온 peeky버전보다도 빠른 속도로 높은 정확도에 도달할 수 있음.
그래프로 나타내면 아래 그림과 같음

3.2 Attention 시각화

Attention이 시계열변환을 수행할 때 어느 원소에 주의를 기울이는지 볼 수 있음
각 시각의 Attention weight를 2차원 맵으로 나타내면 됨
가로축: 입력문장, 세로축: 출력문장

놀랍게도 Attention모델은 AUGUST가 8월에 해당한다는 사실을 데이터만 가지고 학습했음을 알 수 있음!!
(눈으로 확인하니 짱신기하다)
딥러닝 모델들은 보통 신경망 안에서 무슨 일이 일어나는지 알 수 없다는 답답함을 가지고 있었는데,
Attention을 통해 모델 안에서 무슨 일이 일어나는지를 알 수 있게 됨!
→ 모델의 처리 논리가 인간의 논리를 따르는지도 볼 수 있게 됨!!!

'기본기 다지기 > CNN부터 Attention까지 구현' 카테고리의 다른 글

[기본이론] Attention 구현(3/3) (1)	2024.09.27
[기본이론] Attention 구현(1/3) (1)	2024.09.26
[기본이론] seq2seq 구현 (0)	2024.09.23
[밑시딥] LSTM 구현하기 (0)	2024.09.23
[밑시딥] RNN 구현하기 (0)	2024.09.23

현재글[기본이론] Attention 구현(2/3)

학습블로그.. AI를 곁들인..

computer vision을 중심으로 공부한 내용을 기록합니다.

Today :
Yesterday :

AlexNet, optimization, CS, AI, CNN, cs231n, 최적화, RESNET, PYTHON, VGGNet, 오블완, cv논문, 티스토리챌린지, 경사하강법, AdaGrad, Descent, 토이프로젝트, paperreview, 논문, gradient,

학습블로그.. AI를 곁들인..

[기본이론] Attention 구현(2/3)

Attention 구현(2/3)

2. Attention을 갖춘 seq2seq 구현

2.1 AttentionEncoder 구현

2.2 AttentionDecoder 구현

2.3 AttentionSeq2seq 구현

3. Attention 평가

3.1 Attention을 갖춘 seq2seq 학습

3.2 Attention 시각화

'기본기 다지기 > CNN부터 Attention까지 구현' 카테고리의 다른 글

'기본기 다지기/CNN부터 Attention까지 구현'의 다른글

티스토리툴바

« 2024/12 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

[기본이론] Attention 구현(2/3)

Attention 구현(2/3)

2. Attention을 갖춘 seq2seq 구현

2.1 AttentionEncoder 구현

2.2 AttentionDecoder 구현

2.3 AttentionSeq2seq 구현

3. Attention 평가

3.1 Attention을 갖춘 seq2seq 학습

3.2 Attention 시각화

'기본기 다지기 > CNN부터 Attention까지 구현' 카테고리의 다른 글

'기본기 다지기/CNN부터 Attention까지 구현'의 다른글

관련글

티스토리툴바