Location, Location, Location
FEBRUAR Y 2O10 / WWW . GEOPLA CE . COM about how much impact each contributing factor has on residential housing prices. This is why a model to estimate housing prices and reveal impact factors should be spatially enabled. A Spatially Enabled Model In spring 2009, a prototype of a spatially enabled model for Orange County in Southern California was completed at California State Polytechnic University, Pomona. The model estimates residential housing prices, and quanti-fies how much spatial and nonspatial variables contrib-ute to the price of a house. Sales data from 2008 and other variables that affect housing prices were used to calibrate the model based on a methodology that com-bines GIS and linear-regression techniques. To spatially enable the model, ESRI's ArcGIS Desktop software was used to generate a project database in which residential housing sales records were linked to location-related information. Each record contains non-spatial variables such as the number of bedrooms and building size, but it's also enhanced with location and proximity-based information such as nearby school qual-ity or driving time to the closest beach. By considering spatial and nonspatial variables for a regression model, the price impact factors hidden behind the phrase loca-tion, location, location are considered appropriately. The spatially enabled regression model identifies spatial and nonspatial impacts, and it also quantifies how much they contribute to a house price, providing valuable information to homebuyers when they make purchasing decisions as well as local governments for city-planning purposes. Building a Database When building a spatially enabled database for a housing-price regressions model, it's crucial to decide which nonspatial and spatial variables should be considered to have a relevant impact on housing price. This often is a trial-and-error process. The variables considered for the spatially enabled regression model are listed in Figure 1 and can be summarized in three groups: 1. Transaction data based on 2008 sales data, including the property's price, address, number of bedrooms, building square footage and a flag that indicates if the property is a foreclosure. More than 11,000 sales records were used as model input based on Orange County tax record data provided by First American CoreLogic. The address of each record was used to geocode the sales records to later join with spatial variables. 2. Location-based data that indicate for each sales record the school quality, average household income (based on its Census Block Group income) and the days per year when the temperature exceeds 90°F. 3. Proximity-based data that indicate how close or far each sales record location is from the next major park or beach. It's expected that the closer a property is to a beach or major park, the higher the price. Furthermore, location-based data were assigned to the geocoded transaction records by overlaying the sales records with the following polygon layers: Figure 2. Sales records are overlaid with census polygons containing information about average household income. Figure 1. Data that influence housing prices as well as the housing price itself are organized in nonspatial (transaction data) and spatial databases (location- and proximity-based data) that are linked in the project database. 19