Finding the Next Austin
The next high-growth market
A common practice in the investment world is to build mental models in order to facilitate pattern recognition. Within the equities space, we see this when investors draw corollaries between stocks, such as Stock [X] being the next Apple or Stock [Y] being the next Facebook. Within the venture capital space we see this when startups are deemed the Warby Parker of [X] or the Uber of [Y]. Similarly, within the real estate space, the goal may be to identify the next Austin, or the next up-and-coming market.
In the absence of other data points, the traditional approach to market selection typically consists of analyzing fundamental factors such as demographics, employment, and supply-demand trends. While useful to some degree, this unfortunately produces a restricted view of the world and relies on parameterizing an entire city with just a few data points. Importantly, the traditional approach tends to ignore what’s happening at the micro level, including important patterns such as new business formation and trends which quite frankly can easily be overlooked by the human eye. The question then is: How can we leverage machine learning and alternative data to produce a more holistic view of a market?
Borrowing from Natural Language Processing
In order to leverage machine learning, we need to properly model the features associated with geographic representations. We need the ability to quantify what it means to have three new coffee shops open on the same block, the addition of a Michelin-starred restaurant, or the opening of a new school.
The idea of taking something that’s not easy to model in a standard way, and applying structure to it, is something that’s common within Natural Language Processing (NLP). Take, for instance, text, which is inherently messy to process, as the intent is more than just the words on the paper — there is a structure within the sentence, a structure within a page, and structure across paragraphs.
Within NLP, a simple yet powerful technique for building a numeric representation for a body of text is a bag-of-words model. Within this model, the numeric representation is calculated by counting the number of occurrences of a given word and producing a vector representation of text.
While this model has its shortcomings in terms of capturing more nuanced relationships of writing, it does capture larger relationships that allow us to make broader comparisons. For example, are two books written by the same author? Or are two books of the same genre?
Generating neighborhood vectors
We can take a similar approach to modeling a zip code, where the vector representation of the zip code is just the collection of the points of interest that reside within that zip code. If we think of a zip code as the total number of schools, coffee shops, bars, daycares, and libraries, we can then use the vector representation to effectively compare one geographic region to another.
Leveraging our partnership with Foursquare, we start with a curated dataset of real-time business formation, which gives us unique insight into the vitality of a neighborhood. Furthermore, this data set provides us with rich location attribution providing a number of insights such as foot traffic at the location, hours of the venue, as well as detailed venue categorizations.
The first step in producing a zip code vector is generating a monthly roll-up of the data over 12 years, aggregated by more than 200 categories. Encoding a time dimension into the zip code vector gives us two advantages over standard feature encoding. First, it allows us to measure the rate of change within a neighborhood, which provides insight into the dynamism of the region. Second, and more importantly, it allows us to compare geographic regions at two different points of time.
The second step is to scale the data across neighborhoods to ensure we weigh our features correctly. Within NLP, a commonly used technique is Term Frequency Inverse Document Frequency (TFIDF). The rationale behind this technique is we want to ensure that we highlight meaningful words rather than just assuming words that are common are meaningful. Similarly, with describing a zip code, the introduction of something like a Michelin-starred restaurant or WholeFoods should be weighed more than something more common across zip codes like a convenience store or gas station.
This ability to numerically describe a zip code at a given point of time is what allows us to determine, for instance, which markets today look like Austin did in 2014.
|Venue Name||Address||Postal Code||Foot Traffic Score||Category||Category 2|
|Olga's Kitce||xx||43831||.89991||Greek Restaurant||Lounge|
|Date||Postal Code||# Bar||# Pizzeria||# Barbecue||# Greek Restaurant||# Speak Easy|
Now that we have a vector representation of zip codes, we can leverage this data set to identify markets that exhibit similar characteristics in terms of business formation. To do this, we need to measure the distance between our target vector and input vector, where the target vector corresponds to the neighborhood with our desired characteristics—for instance, Austin in 2014.
To prove that this is a viable approach to describing and comparing zip codes, let’s look at the following examples: the neighborhoods of Hoboken, New Jersey and Birmingham, Michigan. Each of these cities represents two very different geographic regions, one with suburban characteristics and the other more urban.
Using Cosine similarity as a metric of distance, we see that zip codes which our model found to be most similar to Birmingham, Michigan, are more affluent neighborhoods where the median home value, household income, and educational attainment are well above the national average.
|City/MSA||Zip Code||Similarity Rank||Median Home Value||Median Household|
|% of Population with a Bachelor's Degree or Higher|
|Oklahoma City, OK||73116||1||221,500.00||$92,449.00||57.04%|
If we run the same analysis on Hoboken, New Jersey, the model finds zip codes that are most similar to be within more urban areas such as San Francisco, Chicago and Brooklyn.
|City/MSA||Zip Code||Similarity Score||Media Home Value||Median Household Income||% of Population with a Bachelor's Degree or Higher|
|Hoboken, New Jersey||07030||-||$703,000||$197,100||74.30%|
San Francisco, CA
|Iowa City, Iowa||52240||4||$215,700||$79,597||34.57%|
Now that we’ve established that our model can accurately describe a zip code numerically and we can then use this numerical definition to compare the zip code’s likeness to other regions within the US, we can leverage our model to identity markets poised for growth.
Finding the next Austin
It’s no secret that commercial real estate in Austin has performed extremely well over the past cycle. Using our model, how can we identify markets that exhibit similar characteristics to Austin in 2014?
Leveraging our zip code vectors, we can simply aggregate them to the county level. In this case, we’ll look for counties in 2019 with similarities to Travis County, TX, which houses our target Austin.
|Raleigh, North Carolina||3|
|Charlotte, North Carolina||4|
Here, we see that Nashville comes closest to resembling what Austin looked like in 2014. In other words, as ranked by our model, Nashville looks best-positioned to be the next Austin, a view we then reinforce with a more traditional analysis of the Nashville market to more comprehensively understand the opportunity.
(Read: Market Spotlight: Nashville)
What does this mean?
At Cadre, we’re focused on understanding the complete picture. The importance of incorporating non-traditional data into one’s investment thesis cannot be understated. While we’re currently focused on market level analysis, continued investment will allow us conduct this analysis at even lower granularities. All said, this approach to understanding the trajectory of a market only provides part of the picture, and should be paired with adequate fundamental analysis for a holistic view of the market.
Cadre provides accredited investors with direct access to institutionally-underwritten and data-driven commercial real estate investment opportunities. To get started, please request access to the platform.