2.4 Curve Fitting

The goal of this section is to derive a method which we can use to generate a best fit function g(x) for given points (x1,y1),(x2,y2),(x3,y3),....,(xn,yn). One constraint with the process presented in this section is that our best fit function g(x) must be of the form c1f1(x) + c2f2(x) + c3f3(x) + ... + cnfn(x). Our first step in deriving a method to find the desired best fit function is to understand a vector process called the Gram-Schmidt Orthogonalization procedure.

Introduction to the Gram-Schmidt Orthogonalization Procedure

The goal of the Gram-Schmidt Orthogonalization procedure is to find a set of orthogonal vectors (section 2.3.2) which exist in the same vector space as defined by a given set of vectors. In order to do this, we will use Figure 2.6. a1 and a2 are arbitrary vectors which exist in a common vector space. w1 is the component of a2 that exists in the vector space with a basis defined by just a1. w2 is orthogonal to w1. Our goal is to find mathematical expressions for w1 and w2.

PIC

Figure 2.6: Vectors used for the derivation of the Gram-Schmidt Orthogonalization procedure

Let us start with w1. Using geometry, we can state equation 2.202.

w1 = ||a2||cos𝜃 a1 ||a1|| (2.202)

Plugging in equation 2.197 for cos𝜃, we get equation 2.203.

w1 = ||a2|| a1 a2 ||a1||||a2|| a1 ||a1|| (2.203)

Simplifying equation 2.203 and plugging in the relation that ||a1||||a1|| = a1 a1 we are left with equation 2.204.

w1 = a2 a1 a1 a1a1 (2.204)

Now, we know that w2 is simply equal to a2 w1. w2 is orthogonal to both a1 and w1. Therefore, the vectors a1 and w2 constitute a set of orthogonal vectors that exist within the same vector space as the original a1 and a2. We have achieved our goal. We can extend this methodology to find an orthogonal vectors u1,u2,...,un within the same vector space as input vectors a1,a2,...an. This is shown in the following equation.

u1 = a1 u2 = a2 u1 a2 u1 u1u1 u3 = a3 u1 a3 u1 u1u1 u2 a3 u2 u2u2 un = ...

In the general equations provided, u2 is equal to the w2 shown in Figure 2.6. The big picture concept of the presented procedure is to force un+1 to be orthogonal to un by subtracting out components of an+1 that are not orthogonal. This process can be repeated as many times as necessary to generate a full orthogonal set u1,u2,...,un.

Using the Gram-Schmidt Orthogonalization Procedure for Curve Fitting

As mentioned previously, for the method presented in this section, our best fit function must be of the form shown in equation 2.205.

c1f1(x) + c2f2(x) + c3f3(x) + ... + cnfn(x) = y (2.205)

Plugging a given dataset (x1,y1),(x2,y2),(x3,y3),...,(xn,yn) into equation 2.205, we get equation 2.206.

c1 [ f1(x1) f1(x2) f1(x3) f1(xn) ]+c2 [ f2(x1) f2(x2) f2(x3) f2(xn) ]+c3 [ f3(x1) f3(x2) f3(x3) f3(xn) ]+cn [ fn(x1) fn(x2) fn(x3) fn(xn) ] [ y1 y2 y3 yn ] (2.206)

Essentially, the trick is to chose the correct c1,c2,c3,...,cn so that our is as close as possible to the given y1,y2,y3,...,yn. Please note that I have transitioned from using the notation for vectors and instead have shifted to large brackets in order to present the above equation in a more intuitive form. In moving forward with this vector mindset, we will refer the vector with coefficient c1 as a1. This logic can be extrapolated to an, while the vector with components of y1,y2,y3,...,yn will be referred to as y. This is shown mathematically in equation 2.207.

c1a1 + c2a2 + c3a3 + ... + cnan y (2.207)

You may notice that a1,a2,a3,...,an can be thought of as constituting a vector space.

In order to find the desired c1.c2,c3,...,cn we first start out by building an orthogonal basis for the vector space defined by equation 2.207. In order to do this, we can use the gram-schmidt orthogonalization procedure explained previously to transform a1,a2,a3,...,an to the orthogonal counterparts u1,u2,...,un. We then want to find the vector that exists within this basis that is “closest” to y. This vector will be defined as yf. Mathematically, we can state that we have found the correct yf when the dot product of the difference between y and yf (w) with y is 0. If this is confusing, reference Figure 2.6, and imagine a1 is vector basis of yf, while a2 is our true y. Thus, we state the closest any vector within the vector space of a1 can get to a2 is a yf such that yf (y yf) = 0. (y yf) is w2 in Figure 2.6. Thus, we get equation 2.208.

yf (y yf) = 0 (2.208)

We can use equation 2.204 to find yf for u1,u2,...,un as shown in equation 2.209.

y u1 u1 u1u1 + y u2 u2 u2u2 + .... + y un un unun = yf (2.209)

Since yf is build up from vectors that exist within our original vector space (a1,a2,a3,...,an), we know that there exists a combination of c1,c2,...,cn that output this vector. These values are the solution to our problem. Therefore, we can use system of equation techniques to solve equation 2.210 and obtain our coefficients of interest.

c1 [ f1(x1) f1(x2) f1(x3) f1(xn) ]+c2 [ f2(x1) f2(x2) f2(x3) f2(xn) ]+c3 [ f3(x1) f3(x2) f3(x3) f3(xn) ]+cn [ fn(x1) fn(x2) fn(x3) fn(xn) ] = yf (2.210)

Using Matrix Techniques

The process presented in the previous section requires a considerable amount of work. First, yf must be found. Later, a system of equations needs to be solved for c1,c2,c3,...,cn. In this section, we will derive a matrix (section 2.6) process for finding c1,c2,c3,...,cn. The first step in doing this is to recognize that uk y can be rewritten as [uk]T[y] (therefore transforming uk as a 1x3 matrix and y as a 3x1 matrix). Given this, we can rewrite equation 2.209 as shown in equation 2.211.

1 u1 u1[u1][u1]T[y] + 1 u2 u2[u2][u2]T[y] + .... + 1 un un[un][un]T[y] = y f (2.211)

We can factor out [y] from equation 2.211 as shown in equation 2.212.

[ 1 u1 u1[u1][u1]T + 1 u2 u2[u2][u2]T + .... + 1 un un[un][un]T] [y] = y f (2.212)

For simplicity, the vector term within the large brackets will be referred to as matrix [P] from now on as shown in equation 2.213.

[ 1 u1 u1[u1][u1]T + 1 u2 u2[u2][u2]T + .... + 1 un un[un][un]T] = [P] (2.213)

From equation 2.212, we can see that multiplication by the [P] matrix transforms the given vector into the “closest” vector to the given vector that exists within the vector space built up of u1,u2,...,un. Plugging yf in we get equation 2.214.

[P][yf] = [yf] (2.214)

We can expand upon this and say that the matrix [P] multiplied by any vector which exists with the vector space governed by the basis u1,u2,...,un will simply equal that original vector.

From equation 2.212 we know that the matrix [P] is a summation of matrices of the form shown in equation 2.215.

[un][un]T (2.215)

Taking the transpose of equation 2.215, we get the equation 2.216.

([un][un]T) T (2.216)

Applying equation 2.268 to equation 2.216 we get equation 2.217.

([un]T) T[u n]T (2.217)

Now, we know from equation 2.267 that a matrix double transposed simply equals the original matrix ([B]T) T = [B]. Plugging this into equation 2.217 we get equation 2.218.

[un][un]T (2.218)

You may notice that equation 2.218 is equivalent to equation 2.215; this is the term we started with before we took the transpose. Therefore, we can state equation 2.219.

[P]T = [P] (2.219)

We can multiply y from equation 2.207 by matrix [P] get equation 2.210.

c1a1 + c2a2 + c3a3 + ... + cnan = [P]y (2.220)

As can be deduced, equation 2.220 simply consolidates the orthogonlization into the y[P] matrix multiplication. We can consolidate the a vectors to create matrix [A] and c constants to create matrix [C] as matrices as shown in equations 2.221 and 2.222.

[A] = [ f1(x1) f2(x1) f3(x1) fn(x1) f1(x2) f2(x2) f3(x2) fn(x2) f1(x3) f2(x3) f3(x3) fn(x3) f1(xn) f2(xn) f3(xn) fn(xn) ] (2.221)

[C] = [ c1 c2 c3 cn ] (2.222)

Rewriting equation 2.220 using the matricies in equations 2.221 and 2.222 we get equation 2.223.

[A][C] = [P]y (2.223)

Now, our goal is to solve for [C]. Intuitively, it seems like we should multiply both sides of the equation by [A]1. However, going back to the definition of this matrix (2.221), we realize that this matrix will not always be square. Therefore, our work-around solution is to multiply both sides of the above equation by [A]T as shown in equation 2.224.

[A]T[A][C] = [A]T[P]y (2.224)

Combining equation 2.267 with 2.268 we can simply equation 2.224 as shown in equation 2.225.

[A]T[A][C] = ([P]T[A])Ty (2.225)

Substituting equation 2.219, we can simplify further as shown in equation 2.226.

[A]T[A][C] = ([P][A])Ty (2.226)

We know that the vectors that constitute a [A] (reference equation 2.221) must exist in the vector space whose basis is u1,u2,...,un (since these basis vectors themselves are simply an orthogonalization of the vectors that constitute [A]). Therefore, we can apply the mental exercise conducted in equation 2.214 to state that [P][A] = [A]. Given this substitution, we can simply further as shown in equation 2.227.

[A]T[A][C] = [A]Ty (2.227)

Now, since [A]T[A] is square, we can take the inverse of this term to solve for [C]. Doing this, we end up with equation 2.228.

[C] = ([A]T[A])1[A]Ty (2.228)
Short Example

In this example, we will start out with the given points (.5,3), (1,1), (5,1), (7,2). We will fit an equation of the form shown in equation 2.229 to the provided points.

c1x2 + c 2x + c3 ln(2x) = y (2.229)

As can be deduced, our task is to find the coefficients c1, c2, and c3. Per laid out process, our problem can be rewritten using equation 2.206 as shown in equation 2.230.

c1 [ .25 1 25 49 ]+c2 [ .5 1 5 7 ]+c3 [ 0 .69 .916 1.25 ] [ 3 1 1 2 ] (2.230)

Performing the Gram-Schmidt organization procedure on the vectors provided in equation 2.230 (a1, a2, a3 using previous terminology), we get the following.

u1 = [ .25 1 25 49 ]

u2 = [ .46 .84 1.12 .59 ]

u3 = [ .046 .619 .356 .169 ]

We can find yf using the equation 2.209 as shown in equation 2.231.

[ 3 1 1 2 ] [ .25 1 25 49 ] [ .25 1 25 49 ] [ .25 1 25 49 ] [ .25 1 25 49 ]+ [ 3 1 1 2 ] [ .46 .84 1.12 .59 ] [ .46 .84 1.12 .59 ] [ .46 .84 1.12 .59 ] [ .46 .84 1.12 .59 ]+ [ 3 1 1 2 ] [ .046 .619 .356 .169 ] [ .046 .619 .356 .169 ] [ .046 .619 .356 .169 ] [ .046 .619 .356 .169 ] = [ .365 1.291 1.685 1.658 ] (2.231)

Now that we have found yf, we can assemble the equation laid out by 2.210 as shown in equation 2.232.

c1 [ .25 1 25 49 ]+c2 [ .5 1 5 7 ]+c3 [ 0 .69 .916 1.25 ] = [ .365 1.291 1.685 1.658 ] (2.232)

Solving the systems of equations provided within equation 2.232 for our unknowns, we get c1 = .05, c2 = .75, and c3 = .85. Plugging this into our original fitting equation (equation 2.229) we get equation 2.233.

.05x2 + .75x .85ln(2x) = y (2.233)

As laid out previously, we can derive the same solution using matrix techniques. Applying equation 2.221 to this situation, we get equation 2.234 [A].

[A] = [ .25 .5 0 1 1 .69 25 5 .916 49 7 1.25 ] (2.234)

Plugging in equation 2.234 along with the given y into equation 2.228 we get equation 2.235.

[ c1 c2 c3 ] = ( [ .25 .5 0 1 1 .69 25 5 .916 49 7 1.25 ]T [ .25 .5 0 1 1 .69 25 5 .916 49 7 1.25 ] )1 [ .25 .5 0 1 1 .69 25 5 .916 49 7 1.25 ]T [ 3 1 1 2 ] (2.235)

Solving, we end up with equation 2.236.

[ c1 c2 c3 ] = [ .05 .752 .853 ] (2.236)

This result matches the result we obtained using the non-matrix method.

In Figure 2.7, a graph of 2.233 is provided along with the points provided in the problem statement.

PIC

Figure 2.7: Graph of best fit line (equation 2.233) as compared to the points provided in the problem statement

2.4.1 R Squared

The R2 statistic is a tool to determine how closely our best fit function fits to the given data-set. This statistic is calculated using equation 2.237.

R2 = 1 (yi ȳ)2 (yi f(xi))2 (2.237)

In equation 2.237, ȳ is the mean given y value. f(x) corresponds to the best fit function whose generation has been detailed in section 2.4.

As can be deduced from equation 2.237, an R2 of 1 means that the best fit function perfectly models the data (goes through all the points). The closer the R2 statistic is to 0, the worse of a fit exists.

Short Example

In this example, we will generate an R2 static for the example provided in section 2.4. As a reminder, we were provided the following data-set; (.5,3), (1,1), (5,1), (7,2). We used this data-set to generate the following best fit equation.

.05x2 + .752x .853ln(2x) = f(x)

We can use this data to generate Table 2.3.

Table 2.3: Table showing x, y (provided value), ȳ (mean), and f(x) (estimate from the best fit equation).
x y ȳ f(x)
.5 3 1.75 .36
1 1 1.75 .11
5 1 1.75 .55
7 2 1.75 .56

Plugging this data into equation 2.237 we get an R2 value of 2.64. The fact that this value is negative signifies that the data-set has a greater variation from the best fit function than from it’s own mean.