To the non-data geeks out there a regression is a means of looking at past data and creating a simple formula that would allow you to predict what future sales may be like based upon past data.
The reason I put MAY in bold is that the past is never a great predictor of the future, the stock market is a great example of that. How many people were thinking that their 401Ks that had been growing steadily would drop 50% in one year? Not me!
After looking at the total sales, I eliminated the following types of transactions from the sales data:
- Related party transactions (family member selling to family member)
- Commercial Property
- Properties valued over $1 million
- Dumb Buyers (People paying over $200/sq ft for new construction this year)
The statistics of the properties are as follows:
- Median sales price was $248,900
- Standard Deviation is $170,413
Creating a histogram of the data you can see the distribution:
A histogram is just a fancy way of saying we want to graph the number of times (the frequency) the assessment error was within a certain value range. You can see that the total number of properties that have apssessment errors that are $5K more than the selling price or $5K less than the selling price is about 30.
On the side of the graph you will see the following variables:
That tells us that the average assessment is $4,578 higher than what the house actually sold for.
standard deviation 30,861
In statistics the assumption is for a bell shaped distribution 68% of the data is captured within 1 standard deviation ($30,861) from the mean. 95% of the data will be captured within 2 standard deviations from the mean ($61,772).
That tells us that most properties (95%) will sell with an assessment error between -$66,300 and $57,144. Remember a negative assessment error means that the property is assessed at more than it will sell for, and a positive assessment error means that the property is assessed less than it will sell for.
Tells us that this data is being built off of 155 different properties.
From the bell shape of the data we can assume that the assessment error approximates a normal distribution.
So we want to look at all the previous assessment prices and figure out a formula that looks like this:
Sales Price = Fixed Constant + Assessment_Coefficient * 2009 Assessment
What this does is if we have a property we can plug in the 2009 Assessment value and the Fixed Constant and Assessment_Coefficients to get the price. Thankfully we have computers that can take that raw data and figure out the coefficients for us.
The linear regression looks like this:
The blue line is what the computer has generated based upon the current data points. It tries to draw a straight line so that if you were to enter a 2009 assessment value you could calculate what the likely sales price would be. The red dots indicate where the sale actually occurred and the distance between that red dot and the blue line is the error of the prediction.
The R-sq value on the graph shows how well does this line fit the data. It fits the data about 96.7%. Not a perfect fit, but as the phrase goes good enough for government work!
So the formula looks like this:
Sales Price = -$6,650 + 1.007 * Assessment
What that tells us if you take a property and use the 2009 assessment and increase it by .7% and then subtract $6,650 dollars you should have the likely selling price.
Lets check it against some of the data we have:
503 Jester Lane, Charlottesville, VA
It was assessed at $142,000 in 2009. Using the formula we can expect the sales price to be:
Sales Price = -$6,650 + 1.007 * $142,000
It actually sold for $136,700 so it was only off by $357.
Don't be so surprised that the model can be this accurate, because we built the model based upon this data. If you look at the data most of the sample data is for sales within $200K - $400K.
Lets look at another two which will demonstrate the extremes.
145 Ivy Ridge Rd, Charlottesville, VA
It was assessed at $721,700 in 2009. Using the formula we can expect the sales price to be:
Sales Price = -$6,650 + 1.007 * $721,700
It actually sold for $780,000 so the model was lower by $59,898.
This should not comes as a surprise since there are fewer sample points to build the regression model from in the $700K or greater valuation price.
1225 Old Garth Rd, Charlottesville, VA
It was assessed at $469,600 in 2009. Using the formula we can expect the sales price to be:
Sales Price = -$6,650 + 1.007 * $469,600
It actually sold for $405,000 so the model was higher by $61,237.
So what does all of this tell us?
With the regression model showing only a 0.7% increase to the 2009 assessed value and then subtracting off $6,650 before getting to the sales price tells me that most buyers are paying very close attention to the 2009 assessed values.
Bear in mind IT IS ONLY A FORMULA. And since it is based upon the past, so to be accurate this formula needs to be updated. I plan on tracking this on a monthly basis and seeing how the market is changing.
I would be interested in people's feedback on what they think of the current pricing formula. Feel free to leave a comment.