Here you can curve and surface fit your 2D and 3D data online with a
rich set of error
histograms, error plots, curve plots, surface plots, contour plots,
VRML, and source code.
The author, James Phillips began to curve and surface fit when he
was in R&D about 15 or 16 years ago. These were mostly X-ray
transmission and backscatter curve and surface data sets from the
measurement of steel and aluminum thickness. He also did work with
X-ray fluorescence of zinc coatings on steel. The software tools he
had on hand at the time were too expensive for general use. Since he
was learning to program computers, he started writing curve fitting
software, first in C and then in C++. He left Tokyo, Japan in early
2000, returning to Birmingham, Alabama, and started programming in
Python.
"Linear regression" is used for equations that are linear *in the
coefficients*. Here are some specific examples: y = mx + b
y = aX^0 + bX^1 + cX^2 + dX^3
z = a + bX + cY + dXY These equations are all of the general form: result = coefficient_1 * function_1 + coefficient_2 * function_2 +
coefficient_3 * function_3 + ... where function_1 can be x^2, function_2 can be sin(x), etc., etc.
Here is an example equation that is NOT linear in the coefficients; y = exp(a*x)
where we want to find the coefficient "a." The format of this
equation cannot be used for linear regression, because it is not in the general
form just discussed. This is where "non-linear regression" comes into play.
The basic idea is: 1) Make an initial guess for the value of
"a"
2) Using our data set, calculate the sum of squared error (SSQ)
3) Try to change a so as to reduce the SSQ calculated in step 2
above (sometimes using derivatives)
4) Repeat steps 1 through 3 until the SSQ is as low as we can make
it Note that step 4 means repeated iteration, so
non-linear regression is often referred to as "using iterative
methods," which means it is computationally more expensive than
linear regression and with large data sets can take a long time.
Multiple Linear
Regression Suppose you have 2-dimensional XY data, and
want to fit a straight line to this data. The equation is commonly
written as: y = mx + b
This can be rewritten in polynomial form as y = ax^0 + bx^1
A quadratic is then y = ax^0 + bx^1 + cx^2
and a cubic is then y = ax^0 + bx^1 + cx^2 + dx^3
and so on.
Straight lines are done, let's move on.
For multiple
linear regression, the key idea is the data formatting for the
regression.
Let's say you want to fit
your X and Y 2D data to this equation:
y = ax^3 + bsin(x)
Format your data as columns of
x^3 sin(x)
and again regress against Y.
Now for n-dimensional regression. Let's do a simple XYZ 3D fit
first. Start with the equation
z = ax + by
Format your data as columns of
X Y
and regress against Z. All done.
For a more complex 3D problem, say
Z = a + bcos(X*Y) + cexp(X/Y)
format your data as columns of
1 cos(X*Y) exp(X/Y)
and regress against Z.
Absolute vs. Relative
Error Start with absolute error, since it is simple. absolute error = predicted value - actual value
So if your curve fit predicts an output of 11.0 when the actual
value is 10.0, then absolute error = predicted - actual = 11.0 - 10.0 = 1.0
So in this case absolute error is 1.0.
Relative error is absolute error / actual
So using the above example, relative error = absolute error / actual = 1.0 / 10.0 = 0.1
Multiply by 100 to get percent error. percent error = relative error * 100 = 0.1 * 100 = 10%
In this example, the curve fit is off by 10 percent. |