LOCFIT


Getting started with LOCFIT

Let's generate and plot 100 points on a sine curve:
> x <- 10*runif(100)
> y <- 5*sin(x)+rnorm(100)
> plot(x,y)
The goal of LOCFIT is to fit a smooth trend curve to a scatterplot like this. In this simulated example, we know the true trend is a sine wave, and so we might use least-squares regression onto a sine curve to estimate the trend. However, for a real dataset, the functional form is usually unknown, and so one wants the data points themselves to decide the form of the fitted curve.

The simplest interface to LOCFIT is through the smooth.lf() function:

> fit <- smooth.lf(x,y)
> points(fit,col=2,pch="+")
This computes a smooth local regression estimate of the trend curve, evaluated at each data point xi. The returned object fit is a list with components x (which equals the original curve), and y (the corresponding estimated trend values).

Of course, we'd probably prefer to show the trend curve as a continuous curve, rather than as a series of points. Here's how to do this:

> xev <- seq(0,10,length=200)
> fit <- smooth.lf(x,y,xev)
> plot(x,y)
> lines(fit,col=2)
The xev argument says where to evaluate the trend curve.

The locfit() Function

The smooth.lf() function provides a simple interface to LOCFIT. However, it lacks much of the functionality that you'll probably want. For example, it doesn't give standard errors for the fit.

The locfit() function provides a much more powerful interface. It is built around the S modeling language, with syntax similar to functions such as lm(), nls() and loess(). Roughly, it's a two-stage process. locfit() computes the trend curve at a selected set of points, and returns an object with the "locfit" class. Methods such as plot.locfit() and predict.locfit() are used to evaluate and plot the trend curve at arbitrary points.

Let's redo the above example:

> fit <- locfit(y~lp(x))
> plot(fit,get.data=T)
The model formula can be read as y is modeled by a local polynomial in x.

Looking at the plots, you'll probably decide that the fit isn't very good. Too much smoothing is being done; the peaks in the sine curve aren't adequately modeled by the smooth curve. To fix this, one must reduce the bandwidth, or width of the windows used to fit the local polynomials:

> fit <- locfit(y~lp(x,nn=0.5))
> plot(fit,get.data=T)
This specifies a 50% nearest neighbor bandwidth, meaning that at each fitting point, the bandwidth is chosen to cover 50% of the data (the default is nn=0.7). Alternatively, a `fixed' or constant bandwidth can be specified:
> fit <- locfit(y~lp(x,h=2))
> plot(fit,get.data=T)
This specifies a window half-width of h=2 everywhere.

Try fitting with different values of nn and/or h. For small values of the smoothing parameters, the curve is undersmoothed, or too variable. For large smoothing parameters, the fit is oversmoothed, or biased.