{"id":1075,"date":"2024-08-09T16:05:10","date_gmt":"2024-08-09T16:05:10","guid":{"rendered":"https:\/\/summergeometry.org\/sgi2024\/?p=1075"},"modified":"2024-08-09T16:55:42","modified_gmt":"2024-08-09T16:55:42","slug":"whats-a-neural-function","status":"publish","type":"post","link":"https:\/\/summergeometry.org\/sgi2024\/whats-a-neural-function\/","title":{"rendered":"What&#8217;s a Neural Function?"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Butlerian concerns aside, <em>neural networks<\/em> have proven to be extremely useful in doing everything we couldn&#8217;t think was to be done in this century; extremely advanced language processing, physically motivated predictions, and making strange, artful images using the power of bankrupt corporate morality.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Now, I&#8217;ve read and seen a lot of this &#8220;stuff&#8221; in the past, but I never really studied it, in-depth. Luckily, I got put with four exceedingly capable people in the area, and now manage to write a <em>tabloid<\/em> on the subject. I&#8217;ll write down the very basics of what I learned this week.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Taylor&#8217;s theorem<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Suppose we had a function \\( f : F \\rightarrow L \\) between the <em>feature space<\/em> \\(F\\) and a <em>label space<\/em> \\(L\\), both of these spaces are composed of a finite set of data points \\( x_i \\) and \\( y_i \\), we&#8217;ll put them into a <em>dataset<\/em> \\(\\mathfrak{D} = \\{(x_i, y_i,) \\}^N_{i=1} \\). This function can represent just about anything as long as we&#8217;re capable of identifying the appropriate labels; images, videos and weather patterns.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The issue is, we <em>don&#8217;t know<\/em> anything about \\(f\\), but we <em>do<\/em> have a lot of data, so can we construct a arbitrarily good approximation \\( f_\\theta \\) that functions a majority of the time? The whole field of machine learning asks not only if this is possibly, but if it is, <em>how <\/em>does one produce such a function, and <em>with how much data?<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"399\" src=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-1024x399.png\" alt=\"\" class=\"wp-image-1085\" style=\"width:557px;height:auto\" srcset=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-1024x399.png 1024w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-300x117.png 300w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-768x299.png 768w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-1536x599.png 1536w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-2048x798.png 2048w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-1200x468.png 1200w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural1-1-1980x772.png 1980w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Indeed, such a mapping may be <em>extremely <\/em>crooked, or of a high-dimensional character, but as long as we&#8217;re able to build <em>universal function approximators<\/em> of arbitrary precision, we should, in principle, be able to construct any such function.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"760\" src=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-1024x760.png\" alt=\"\" class=\"wp-image-1092\" style=\"width:422px;height:auto\" srcset=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-1024x760.png 1024w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-300x223.png 300w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-768x570.png 768w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-1536x1140.png 1536w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-2048x1520.png 2048w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-1200x891.png 1200w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/taylor1_1-1980x1469.png 1980w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Third-order Taylor approximation for the cubic polynomial \\( x^3\/7 + cos(x) \\) around the point \\( x = 3 \\). <\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">Naively, the first type of function approximator is a machine that produces a <em>Taylor expansion<\/em>; we call this machine \\( f_\\delta(\\mathbf{x;w}) \\) that approximates the real \\( f(\\mathbf{x})\\). It contains the function input \\( \\mathbf{x}\\), and a <em>weight vector<\/em> \\(\\mathbf{w}\\) containing all of the coefficients of the Taylor expansion. We&#8217;ll call this parameter list the <em>weights<\/em> of the expansion.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Indeed, this same process can be taken up by any asymptotic series that converges onto the result. Now, what we&#8217;ve done is that we already had the function and wanted to find this approximation. Can we do the <em>reverse<\/em> procedure of acquiring a generic third degree polynomial: \\[ f_\\delta(x) = c_0 + c_1x + c_2 x^2 + c_3 x^3 \\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And <em>then<\/em> find the weights such that around the chosen point \\( \\mathbf{x} \\) it fits with minimal loss? This question if of course, a extensively studied area of mathematically approximating\/interpolating\/extrapolating functions, and also the motivating factor for a NN, they&#8217;re effectively more complicated versions of this idea using a different method of fitting these weights, but it&#8217;s the same principle of applying a arbitrarily large number of computations to get to some range of values.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first issue is that the label and feature spaces are <em>enormously complicated<\/em>, their dimensionality alone poses a formidable challenge in making a process to adjust said weights. Further, the <em>structure<\/em> in many of these spaces is not captured by the usual procedures of approximation. Taylor&#8217;s theorem, as our hanged man, is not capable of approximating very crooked functions, so that alone discards it, but <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Thought(Thought(Thought(&#8230;)))<\/h2>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"652\" height=\"1024\" src=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-652x1024.png\" alt=\"\" class=\"wp-image-1106\" style=\"width:383px;height:auto\" srcset=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-652x1024.png 652w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-191x300.png 191w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-768x1206.png 768w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-978x1536.png 978w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-1304x2048.png 1304w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-1200x1884.png 1200w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neuron1_3-1980x3109.png 1980w\" sizes=\"auto, (max-width: 652px) 100vw, 652px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">A <em>neural network<\/em> is the graphic representation of our neural function \\(f_\\theta\\). We will define two main elements: a simple affine transform \\(L = \\mathbf{A}_i \\mathbf{x} + \\mathbf{b}_i \\), and a <em>activation function<\/em> \\( \\sigma(L\\), which can be any function, really, including a polynomial, but we often use a particular set of functions that are useful for NNs, such as a <em>ReLu<\/em> or a sigmoidal activation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can then produce a directed graph that shows the flow of computations we perform on our input \\( \\mathbf{x}\\) across the many <em>neurons<\/em> of this graph. In this basic case, we have that the input \\(x\\) is feed onto two distinct neurons. The first transformation is \\( \\sigma_1(A_1x+b_1) \\), whose result, \\(x_1\\), is feed onto the next neuron; the total result is a composition of the two transforms \\(f \\circ g = \\sigma_2(A_2(f)+b_2 = \\sigma_2(A_2(\\sigma_1(A_1x+b_1))+b_2) \\).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The total result of this basic net on the right is then \\(\\sigma_3(z) + \\sigma_5(w) \\), where \\(w\\) is the result of the transformations in the left, and \\(z\\) the ones on the right. We could do a labeling procedure and see then that the end result is of the form of a direct composition across the right layer of the affine transforms \\((A, B, C)\\) and activation functions \\( (\\sigma_1, \\sigma_2, \\sigma_3 \\), and of the left hand side affine transforms \\( (D, E, F) \\) and functions \\( (\\sigma_4, \\sigma_5, \\sigma_6 ) \\), which provides a <em>12-dimensional<\/em> weight vector \\( \\mathbf{w} \\):<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> \\[ f_\\theta(\\mathbf{x;w}) = (\\sigma_3\\circ C \\circ \\sigma_2 \\circ B \\circ \\sigma_1 \\circ A) + (\\sigma_6 \\circ F \\circ \\sigma_5 \\circ E \\circ \\sigma_4 \\circ D) \\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Once again, the idea is that we can <em>retrofit<\/em> the coefficients of the affine transforms and activation functions to express different function approximations; different weight vectors yield different approximations. Finding the weights is called the <em>training<\/em> of the network and is done by an automatic process.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A <em>feed forward NN<\/em> is a deep, meaning it has more than intermediate layer, neural function \\( f_\\theta(\\mathbf{x;w}) \\) that, given some affine transformation \\(f\\) and activation function \\(f\\), is defined by:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\[ f_\\theta(\\mathbf{x;w}) = f_{n+1} \\circ \\sigma _n \\circ f_n \\circ \\cdots \\sigma_1\\circ f_1 :  \\mathbb{R}^{n} \\rightarrow \\mathbb{R}^{n+1}  \\]<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here we see a basic FF neural function \\( f_\\theta (\\mathbf{x, w}) \\) and its corresponding neural network representation; it has a <em>depth<\/em> of 4 and a <em>width<\/em> of 3. The input \\( x\\) is feed into a singular neuron, that doesn&#8217;t change it, and its then feed to three distinct neurons, each with its own weights for the affine transformation, and this all repeats until the last neuron. All of them have the same underlying activation function \\( \\phi\\):<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"698\" src=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-1024x698.png\" alt=\"\" class=\"wp-image-1110\" style=\"width:698px;height:auto\" srcset=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-1024x698.png 1024w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-300x204.png 300w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-768x523.png 768w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-1536x1047.png 1536w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-2048x1396.png 2048w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-1200x818.png 1200w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural3_1-1980x1349.png 1980w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">If we actually go out and compute this particular neural function using \\( \\phi \\) as the sigmoid function, we get the approximation of a sine wave. Therefore, we have sucessfully approximated a low dimensional function using a NN How did we, however, get these specifics weights? By means of <em>gradient descent<\/em>. Maybe I&#8217;ll write something about the trainings of NNs as I learn more about them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Equivariant NNs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Now that we approximated a simple sine wave, the obvious next step is the three dimensional reconstruction of a mesh into a signed distance function.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"937\" height=\"1024\" src=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-937x1024.png\" alt=\"\" class=\"wp-image-1151\" style=\"width:342px;height:auto\" srcset=\"https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-937x1024.png 937w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-275x300.png 275w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-768x839.png 768w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-1406x1536.png 1406w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-1874x2048.png 1874w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-1200x1311.png 1200w, https:\/\/summergeometry.org\/sgi2024\/wp-content\/uploads\/2024\/08\/neural4-1980x2164.png 1980w\" sizes=\"auto, (max-width: 937px) 100vw, 937px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">But since I don&#8217;t actually know the latter, I&#8217;ll go to the non-obvious next step of looking at the <em>image of an apple<\/em>. If we were to perform a <em>transformation <\/em>on said apple as either a translation, rotation, or scaling, ideally our neural network should be able to still identify the data as an apple. This means we somehow need to encode the symmetry information onto the weights of the network. This breeds the principle of an <em>equivariant NN<\/em>, studied in the field of <em>geometrical deep learning<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I&#8217;ll try to study these later to make more sense about them as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Butlerian concerns aside, neural networks have proven to be extremely useful in doing everything we couldn&#8217;t think was to be done in this century; extremely advanced language processing, physically motivated predictions, and making strange, artful images using the power of bankrupt corporate morality. Now, I&#8217;ve read and seen a lot of this &#8220;stuff&#8221; in the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"ppma_author":[24],"class_list":["post-1075","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"authors":[{"term_id":24,"user_id":0,"is_guest":1,"slug":"cap-ar-bb","display_name":"ar.bb","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","first_name":"","last_name":"","user_url":"","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/posts\/1075","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/comments?post=1075"}],"version-history":[{"count":9,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/posts\/1075\/revisions"}],"predecessor-version":[{"id":1158,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/posts\/1075\/revisions\/1158"}],"wp:attachment":[{"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/media?parent=1075"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/categories?post=1075"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/tags?post=1075"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2024\/wp-json\/wp\/v2\/ppma_author?post=1075"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}