Pythonで機械学習手書き数字認識学習処理Part(13) | 初心者向けWEB技術と機械学習のBlog

Contents

はじめに
参考図書
２変数の関数
偏微分の実装
勾配(gradient)
まとめ

はじめに

こんにちは、swim-loverです。 Pythonで機械学習の一つである手書き数字認識を実装しています。Pythonを始めたばかりですが、「使いながら覚える」をコンセプトに勉強しています。第12回は、交差エントロピー誤差（ミニバッチ対応版）と数値微分について、Pytorchを使って実装してみました。今回も、参考図書にしたがって、学びを進めます。

参考図書

機械学習の参考書籍として、”斎藤康毅著　ゼロから作るDeep Learning　オライリージャパン　2016年9月”を使用しました。

２変数の関数

以下の２変数からなる関数を実装します。

def f_x1_x2(x,y):
  return x**2+y**2

この関数のグラフを3D Plotをしてみます。3D Plotには、mplot3Dを使うようです。

import numpy as np
import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D

x0 = np.arange(-3, 3, 0.125) # make x0 data
x1 = np.arange(-3, 3, 0.125) # make x1 data
X, Y = np.meshgrid(x0, x1) #make lattice point

out=f_x1_x2(X,Y) #call function

#------------------3d plot ------------------- 
fig=plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(X, Y, out, cmap='summer', linewidth=0)


ax.set_xlim([-3, 3])
ax.set_ylim([-3, 3])
ax.set_zlim([0, 20])
ax.set_xlabel('x0')
ax.set_ylabel('x1')
ax.set_zlabel('f(x0,x1)')
ax.set_title("3D plot")
fig.colorbar(surf)

Surface Plot（表面プロット）で書くことができました。

高さが一番低い場所は、(0,0)の座標の箇所になっています。

偏微分の実装

偏微分は、特定の変数の着目した微分なので、f(x)=x0^2+x1^2 に対する偏微分は以下になります。

x0=3,x1=4の時、∂f/∂x0=6,∂f/∂x1=8となります。これをPythonで実装します。

import numpy as np

def numerial_diff_centor(f,x):
  #h = 10e-50  # bad example, too small value
  h = 1e-4    # good example
  return (f(x+h) -f(x-h))/(2*h)

#f(x)=x0^2+x1^2
def func1(x0):
  return x0*x0+2+4.0**2

def func2(x1):
  return 3.0**2+2+x1*x1


ans=numerial_diff_centor(func1,3)
print(ans)
ans=numerial_diff_centor(func2,4)
print(ans)

解析的（数式の展開によって）に求めた値とほぼ一致することが確認できました。

6.00000000000378
7.999999999999119

勾配(gradient)

全ての変数の偏微分を一つのベクトルとしてまとめたものを勾配(gradiant)と呼びます。

同様にPythonで実装します。

若干、数式が多くごちゃごちゃしてきました。ポイントは、１変数に対して、tmp+h,tmp-hを計算して、数値微分を求め、計算後は、tmpの値を元のx[idx]の位置に戻す処理だと思います。

import numpy as np

def numerial_grad(f,x):
  #h = 10e-50  # bad example, too small value
  h = 1e-4    # good example
  grad=np.zeros_like(x)  #make zero data
  for idx in range(x.size):
    tmp = x[idx]
    #calc f(x+h)
    x[idx]=tmp + h #add h only x[idx]
    fxh1 = f(x)

    #calc f(x-h)
    x[idx]=tmp - h #subs h only x[idx]
    fxh2 = f(x)

    #calc grad about ixd
    grad[idx]=(fxh1-fxh2)/(2*h)
    x[idx]=tmp  #restore tmp

  return grad

def func_x0_x1(x):
  return x[0]**2+x[1]**2


ans = numerial_grad(func_x0_x1,np.array([3.0,4.0])) # grad at point(3,4)
print(ans)

ans = numerial_grad(func_x0_x1,np.array([1.0,1.0])) # grad at point(1,1)
print(ans)

ans = numerial_grad(func_x0_x1,np.array([8.0,8.0])) # grad at point(8,8)
print(ans)

上記の３dプロットでは、高さが一番低い場所は、(0,0)の座標になっています。

書籍の記載の通り、(0,0)から離れた座標ほど、勾配は大きい値になっています。