• Are New Yorkers Willing to Pay More for Subway Proximity?
    • Method
    • Results
      • Plotting the differences in TOD vs Non-TOD Areas
      • Mapping population and rents close to transit
      • Comparing Subway Distance and Average Rent
    • Limitations
    • Conclusion
  • Code Appendix
    • Data Wrangling
      • Environment Set Up
      • New York City data sets
      • Identify areas with high subway proximity
    • Results
      • Indicator Maps
      • Bar charts
      • TOD Indicator Table
      • Graduated symbols maps
      • Plotting Subway Distance vs. Average Rent

Are New Yorkers Willing to Pay More for Subway Proximity?

New York is famous for its iconic subway and sky-high rents. But even in the face of such high rents, are New Yorkers still willing to pay more to live close to the subway? If they are, the City could consider zoning changes to allow more development close to transit. Increased density could mean increased tax revenue, potentially allowing the City to reinvest that income into expanding the subway and other city services.

In this brief, I investigate whether rents are higher in areas close to the subway, compared to areas without subway access, and how that has changed over time between 2011 and 2020.

Method

To answer this question, I relied on US census data from 2011 and 2020 to capture median rents, travel and work behavior, and population figures for census tracts across all five city boroughs. I also sourced subway information from the City of New York to map out subway stations across the city. For the purposes of this analysis, non-subway transit lines, such as the Long Island Rail Road and the Staten Island Railway, were excluded.

Census tracts within a half-mile to the subway were designated as Transit Oriented Development (TOD) areas, while those tracts further away were designated as non-Transit Oriented Development (Non-TOD) areas. A more detailed discussion on this methodology is included in the appendix.The map below shows which areas were designated as a TOD or non-TOD areas in 2011 and 2020. Note, Census tracts may not follow geographic boundaries exactly. For instance, some census tracts may include bodies of water.

Results

Where do most New Yorkers live? The map above shows the total population by census tract across the city. The TOD-designated area is outlined in blue. Between 2011 and 2020, it appears that TOD areas were more populous than non-TOD areas. Some non-TOD areas also appear to have lost population, particularly in Staten Island and eastern Queens.

Our next maps show the the median cost of rent across the city in 2011 and 2020. Median Rent is inflation-adjusted and shown in 2020 dollars. It appears that rents rose over the time period, especially in TOD areas, with the highest rents concentrated in lower Manhattan and northwest Brooklyn.

Household incomes appear to have risen between 2011 and 2020, with the highest incomes in the center of the city within TOD areas. However, a census tract in Staten Island (a non-TOD area) in 2011 had an especially high median household income despite representing a park, suggesting an outlier in the data that should be investigated in subsequent studies.

Mapping the average amount of time spent commuting reveals commute lengths did not significantly change between 2011 and 2020. As would be expected, commuting time increases with distance to the city center. That this appears to still hold true for non-TOD areas suggests that people living in non-TOD areas are still travelling to the city center, and not to another economic center that may be closer to them but falls outside of the map.

Between 2011 and 2020, the proportion of workers working from home rose across the city. In 2020, it appears that this was especially so within TOD areas in southern Manhattan, northwestern Brooklyn. While rates of working from home may have been especially high due to the pandemic, it also appears to be the continuation of a pattern that was already underway in 2011.

How do rents and incomes intersect? From 2011 to 2022, it appears that more households in New York are spending a larger proportion of their incomes on rent. This appears to be especially true in TOD areas.

Plotting the differences in TOD vs Non-TOD Areas

The below box plot shows how the above indicators, divided into TOD and Non-TOD groups.

Working from home: The percentage of workers working from home is much higher in TOD than non-TOD areas in both 2011 and 2020, suggesting these workers value living TOD areas even if they don’t need transit to commute to work.

Commute time: Average commutes were slightly higher in non-TOD areas, suggesting that commuting may not be a significant reason for residents to choose a TOD or non-TOD area to live in.

Incomes: Median household incomes were higher in non-TOD areas in both 2011 and 2020, although 2020 saw the gap narrow between the groups. This suggests wealthier households may be drawn to areas away from transit, perhaps because they may have access to private transportation and do not need to rely on public transit.

Rents: Median rents rose in both TOD and Non-TOD areas between 2011 and 2020. While median rents were higher in Non-TOD areas in 2011, by 2020 median rents in TOD areas had slightly exceeded those of non-TOD areas. The change between 2011 and 2020 suggests renters in 2020 do value transit access enough to pay at least slightly more in rent.

Income spent on rent: Meanwhile, the confluence of these two indicators - i.e. the average proportion of household income spent on rent - is significantly higher in TOD areas in both 2011 and 2020. This suggests that proximity to transit is highly valued for those who live to transit. This is likely because the subway provides a crucial connection to the economic opportunities of the city that poorer households can’t otherwise access.

Population: The total population in the city fell between 2011 and 2020, with more of the loss occurring in Non-TOD areas. In 2020, more people lived in TOD than non-TOD areas.

Taken together, these graphs suggest that poorer New Yorkers are willing to pay higher rents to live closer to transit, while wealthier households are not.

The findings presented above across the maps and graphs are presented below in table format.

Variable 2011: Non-TOD 2011: TOD 2020: Non-TOD 2020: TOD
% Work From Home 26 37 47 69
%Income Spent on Rent 19 27 22 28
Avg Commute (mins) 43 40 45 42
Median HH Income 88,806 63,831 87,515 72,675
Median Rent 1,376 1,288 1,477 1,503
Total Population 3,536 3,785 3,218 3,663

Values rounded to nearest whole number

Mapping population and rents close to transit

This graduated symbols map shows how New York’s population is distributed in areas within half a mile of the subway. Manhattan is the most populous borough.

When it comes to rent, rents within half a mile of the subway are highest in lower Manhattan and northwest Brooklyn. There are also a few areas with higher rents in the center and eastern Queens.

Comparing Subway Distance and Average Rent

The relationship between rents and distance to the subway is significantly different between 2011 and 2020.

Throughout 2011, rents averaged around $1,300 but did vary slightly with distance from the subway. Rents rose by about $150 to their highest rate 1.5 miles from the subway, and slowly dropped back to the average rate as the distance grew. Rents fell another $200 to their lowest rate at four miles from the subway ($1,100), before rising again back to the average.

2020 saw more variability in average rents. Rents within 1.5 miles to the subway were high ($1,550), and then dropped sharply at the two mile mark by about $200. They then rose with distance by $400, to their peak at the 3.5 mile mark, before dropping sharply by $500 to their lowest rate, at mile 4.5 (about $1,150).

Such different patterns in 2011 and 2020 suggest a complex relationship exists between rent and distance to the subway that is beyond the scope of this limited study. This could be in part the effect of other transit services that were excluded from this study, or the presence of services in some areas and not others. Future studies should investigate the phenomena behind this relationship.

Limitations

While every effort was made to produce a cohesive analysis, this study had some limitations.

In the Census’ effort to aggregate city data, some census tracts may have been skewed due to a low number of respondents. For example, the map of median incomes shows a very high income tract in Staten Island, which in reality is largely a park with few inhabitants. Future studies should consider how best to handle such outliers in the data.

Narrowing the transit to just the subway and excluding other transit - such as Long Island Railroad, the Staten Island Railway, the Metro North, and other transit rail links - may have misattributed certain areas as TOD or non-TOD, thereby skewing our results. Future research should investigate the effect of New York’s various rail services across the city.

Conclusion

It appears that some New Yorkers do value transit proximity to the point that they are willing to pay more to live closer to the subway. It is telling that New Yorkers still choose to live in TOD areas despite paying, on average, a higher proportion of their household incomes in rent than those who live in non-TOD areas. This is likely because the subway provides a critical link to the city’s economic center. However, even when New Yorkers work from home and don’t need transit proximity for commuting purposes, they still choose to live in TOD areas - this suggests that the development around transit is in itself an attraction to some New Yorkers.

All of this suggests that the New York City Council could successfully invest in expanding TOD areas in the city, and in doing so, generate additional tax revenue.

Code Appendix

The code used to prepare this brief is presented below.

Data Wrangling

Environment Set Up

To start with, I loaded relevant packages and functions for my analysis.

# Load Libraries

library(tidyverse)
library(tidycensus)
library(sf)
library(kableExtra)
library("viridis")

options(scipen=999)
options(tigris_class = "sf")

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

pallette1 = c("#0d0887", "#7e03a8", "#cc4778", "#f89540", "#f0f921")

pallette2 = c("#dd3497", "#3690c0")

I connected to the Census API to bring in census data.

census_api_key("837cf2c62e6970a6da0be75548a3c9009ce977ee", overwrite = TRUE)

I created a list of census data variables available for 2011 and 2020.

#Call the census to view a list of available variables:

acs_variable_list.2020 <- load_variables(2020, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

acs_variable_list.2011 <- load_variables(2011, #year
                                         "acs5", #five year ACS estimates
                                         cache = TRUE)

#Now, identify variables we want:

acs_vars <- c("B01001_001E", #ACS total Pop estimate
                   "B02001_002E", #Estimated total White only population
                   "B05001_006E", #Estimated total Non-U.S. citizen population
                   "B08013_001E", #Estimated Aggregate travel time to work (in minutes)
                   "B08006_003E", #Estimated Total population who commuted with a Car, truck, or van and drove alone
                   "B08012_001E", #Estimated Total working population who commute
                   "B08006_017E", #Estimated Total population Worked at home
                   "B23025_004E", #Estimated Total population who are employed
                   "B25058_001E", #Estimated Median Rent
                   "B19013_001E") #Estimated Median Household Income

Using those selected Census variables, I called on the Census API to create a dataset of information for 2011.

#Now get those variables from the 2011 and 2020 censuses, respectively
#We'll start with 2011

acsTractsNYC.2011 <- get_acs(geography = "tract",
                             year = 2011, 
                             variables = acs_vars, 
                             geometry = TRUE, 
                             state = "NY", 
                             county = c("New York", "Kings", "Queens", "Bronx", "Richmond"),
                             output = "wide") %>%
  dplyr::select (GEOID, NAME, geometry, all_of(acs_vars))

#Give them logical names
acsTractsNYC.2011 <- acsTractsNYC.2011 %>%
  rename (total_pop = B01001_001E, #ACS total Pop estimate
          total_white = B02001_002E, #Estimated total White only population
          total_non_US = B05001_006E, #Estimated total Non-U.S. citizen population
          total_commute_time = B08013_001E, #Estimated Aggregate travel time to work (in minutes)
          total_soloVehicle = B08006_003E, #Estimated Total population who commuted with a Car, truck, or van and drove alone
          total_commuters =  B08012_001E, #Estimated Total working population who commute
          total_wfh = B08006_017E, #Estimated Total population Worked at home
          total_employed = B23025_004E, #Total employed
          medRent = B25058_001E, #Estimated Median Rent
          medIncome = B19013_001E) #Median HH income

#Consolidate age groups, remove the superfluous ones
acsTractsNYC.2011 <- acsTractsNYC.2011 %>%
  mutate(medRent = (medRent * 1.19), #Factor in inflation rate of 19% between 2011 and 2020
         medIncome = (medIncome * 1.19), #Factor in inflation rate of 19% between 2011 and 2020
         pct_white = (total_white / total_pop),
         year = 2011,
         pct_wfh = (total_wfh / total_employed),
         pct_nonUS = (total_non_US / total_pop),
         pct_employed = (total_employed / total_pop),
         commuting_time = (total_commute_time / total_commuters),
         pct_commuters = (total_commuters / total_pop),
         pct_rent_spent = ((medRent * 12) / medIncome) * 100) %>% #monthly rent * 12 months, divided by annual income
  dplyr::select(-total_wfh, -total_commute_time, -total_white, -total_non_US, -total_employed, -total_commuters) %>%
  st_transform("ESRI:102318") #Set 2011's crs

Then I did the same for the Census’ 2020 data.

#Now to source those variables from 2020 and wrangle them:

acsTractsNYC.2020 <- get_acs(geography = "tract",
                             year = 2020, 
                             variables = acs_vars, 
                             geometry = TRUE, 
                             state = "NY", 
                             county = c("New York", "Kings", "Queens", "Bronx", "Richmond"),
                             output = "wide") %>%
  dplyr::select (GEOID, NAME, geometry, all_of(acs_vars)) %>%
  rename (total_pop = B01001_001E, #ACS total Pop estimate
          total_white = B02001_002E, #Estimated total White only population
          total_non_US = B05001_006E, #Estimated total Non-U.S. citizen population
          total_commute_time = B08013_001E, #Estimated Aggregate travel time to work (in minutes)
          total_soloVehicle = B08006_003E, #Estimated Total population who commuted with a Car, truck, or van and drove alone
          total_commuters =  B08012_001E, #Estimated Total working population who commute
          total_wfh = B08006_017E, #Estimated Total population Worked at home
          total_employed = B23025_004E, #Total employed
          medRent = B25058_001E, #Estimated Median Rent
          medIncome = B19013_001E) %>% #Median HH income
  mutate(pct_white = (total_white / total_pop),
         year = 2020,
         pct_wfh = (total_wfh / total_employed),
         pct_nonUS = (total_non_US / total_pop),
         pct_employed = (total_employed / total_pop),
         commuting_time = (total_commute_time / total_commuters),
         pct_commuters = (total_commuters / total_pop)
         pct_rent_spent = ((medRent * 12) / medIncome) * 100) %>% #monthly rent * 12 months, divided by annual income)
  dplyr::select(-total_wfh, -total_commute_time, -total_white, -total_non_US, -total_employed, -total_commuters) %>%
  st_transform(st_crs(acsTractsNYC.2011)) #Set it to the same CRS as 2011's data

Next, I combined 2020 and 2011 data into one dataframe.

NYC_tracts <- rbind(acsTractsNYC.2011, acsTractsNYC.2020)

New York City data sets

For this investigation, I needed information from the City of New York. First, I called on the NYC API to create a dataframe of subway station points.

stations <- st_read("https://data.cityofnewyork.us/resource/kk4q-3rt2.geojson") %>%
  dplyr::select(name, line) %>%
  st_transform(st_crs("ESRI:102318"))

Next, I called on the NYC API to create a dataframe of borough boundaries. This was useful for highlighting NYC’s different boroughs within the maps.

boroughs <- st_read("https://data.cityofnewyork.us/resource/7t3b-ywvw.geojson") %>%
  st_transform(st_crs(acsTractsNYC.2011))

Identify areas with high subway proximity

To identify TOD vs Non-TOD areas, I created a half-mile buffer around all subway stations and unioned the resulting buffers in order to create a single TOD boundary.

Next, I selected the tracts that were in TOD vs. Non-TOD areas. To do so, I selected by centroids - tracts whose centroids fell within the half-mile TOD buffer were designated as TOD within a new column added to my main dataset for this purpose. Meanwhile, the tracts whose centroids were not within the half-mile buffer were designated as non-TOD.

If I had instead selected all census tracts that fell in the boundary in any way, the data would include a disproportionate amount of information from people living much further away. Similiarly, if I had cut tracts into smaller pieces that fell within the half-mile radius, the census data for the tract would no longer be representative for the leftover tract. While the centroid method can’t perfectly capture everything we seek, this option is a good middle-ground between the alternatives.

#Now to select TOD vs Non-TOD tracts, by centroid:
#First, create a half-mile buffer around subway stations, and set it to the CRS for the other dataframes:

station_buffers <-
  st_union(st_buffer(stations, 2640)) %>%
          st_sf() %>%
  st_transform(st_crs(NYC_tracts))

#Now create the TOD areas by joining tracts and the buffers with their centroids
NYC_all <-
  rbind(st_centroid(NYC_tracts)[station_buffers,] %>%
          st_drop_geometry() %>%
          left_join(NYC_tracts) %>%
          st_sf() %>%
          mutate(TOD = "TOD"),
        st_centroid(NYC_tracts)[station_buffers, op = st_disjoint] %>%
          st_drop_geometry() %>%
          left_join(NYC_tracts) %>%
          st_sf() %>%
          mutate(TOD = "Non-TOD"))

The below map shows the areas designated as within the TOD area and non-TOD area in 2011 and 2020, according to the method described previously.

#Basic plot of TOD vs non-TOD for 2010 and 2020:
ggplot()+
  geom_sf(data = NYC_all, aes(fill = TOD))+
  geom_sf(data = boroughs, color = "white", fill = "transparent")+
  scale_fill_manual(values = pallette2)+
  labs(title = "Areas selected as TOD and Non-TOD",
       subtitle = "New York City")+
  facet_wrap(~year)+
  mapTheme()

Results

The relationship between transit proximity and other demographic indicators is explored below.

Indicator Maps

The below maps illustrate the status of census variables across New York in 2011 and 2020. The dividing line between TOD and Non-TOD areas is highlighted in blue.

The map below shows the changes in the percentage of workers working from home in 2011 and 2020.

#Total % of population working from home - seems wildly skewed
ggplot()+
  geom_sf(data = NYC_all, aes(fill = (pct_wfh * 100)), color = "transparent")+
  scale_fill_viridis(option = "plasma")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  geom_sf(data = station_buffers, fill = "transparent", color = "#54B0FF")+
  labs(title = "Percentage of Workers Working From Home",
       subtitle = "New York City",
       fill = "Percentage of Workers")+
  facet_wrap(~year)+
  mapTheme()+
  theme(plot.title = element_text(size=22))

Our next maps show the the median cost of rent across the city in 2011 and 2020.

ggplot()+
  geom_sf(data = NYC_all, aes(fill = (medRent)), color = "transparent")+
  scale_fill_viridis(option = "plasma")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  geom_sf(data = station_buffers, fill = "transparent", color = "#54B0FF")+
  labs(title = "Median Cost of Rent",
       subtitle = "New York City",
       fill = "Median Cost of Rent\nin 2020 Dollars")+
  facet_wrap(~year)+
  mapTheme()+
  theme(plot.title = element_text(size=22))

This map illustrates the change in median household incomes between 2011 and 2020.

ggplot()+
  geom_sf(data = NYC_all, aes(fill = (medIncome)), color = "transparent")+
  scale_fill_viridis(option = "plasma")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  geom_sf(data = station_buffers, fill = "transparent", color = "#54B0FF")+
  labs(title = "Median Household Income",
       subtitle = "New York City",
       fill = "Median Household Income\nin 2020 Dollars")+
  facet_wrap(~year)+
  mapTheme()+
  theme(plot.title = element_text(size=22))

This map shows the average amount of time workers spent commuting between 2011 and 2020.

ggplot()+
  geom_sf(data = NYC_all, aes(fill = (commuting_time)), color = "transparent")+
  scale_fill_viridis(option = "plasma")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  geom_sf(data = station_buffers, fill = "transparent", color = "#54B0FF")+
  labs(title = "Average Commuting Time",
       subtitle = "New York City",
       fill = "Average number of minutes")+
  facet_wrap(~year)+
  mapTheme()+
  theme(plot.title = element_text(size=22))

This map shows the changes in total population in New York between 2011 and 2020.

ggplot()+
  geom_sf(data = NYC_all, aes(fill = (total_pop)), color = "transparent")+
  scale_fill_viridis(option = "plasma")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  geom_sf(data = station_buffers, fill = "transparent", color = "#54B0FF")+
  labs(title = "Total Population",
       subtitle = "New York City",
       fill = "Total Population")+
  facet_wrap(~year)+
  mapTheme()+
  theme(plot.title = element_text(size=22))

This next map shows what percentage of household incomes is spent on rent across the city. Because there was one extreme outlier that skewed the results significantly, the scale is shown in quintile breaks.

ggplot()+
  geom_sf(data = NYC_all, aes(fill = q5(pct_rent_spent)), color = "transparent")+
  scale_fill_manual(values = pallette1, labels = (qBr(NYC_all, "pct_rent_spent")))+
  geom_sf(data = boroughs, fill = "transparent", color = "black")+
  geom_sf(data = station_buffers, fill = "transparent", color = "#54B0FF")+
  labs(title = "Percent of Household Income spent on Rent",
       subtitle = "New York City",
       fill = "Percent of Household\nIncome spent on Rent\n(Quintile Breaks)")+
  facet_wrap(~year)+
  mapTheme()+
  theme(plot.title = element_text(size=22))

Bar charts

Presenting bar charts of information first required some data wrangling. To do so, I grouped our data by year and TOD status, and then calculated the means for our variables of interest.

NYC_all.Summary <-
  st_drop_geometry(NYC_all) %>%
  group_by(year, TOD) %>%
  summarize("Meidan Rent" = mean(medRent, na.rm = T),
            "Total Population" = mean(total_pop, na.rm = T),
            "% Work From Home" = mean(pct_wfh * 1000, na.rm = T),
            "Median HH Income" = mean(medIncome, na.rm = T),
            "Avg Commute (mins)" = mean(commuting_time, na.rm = T),
            "%Income Spent on Rent" = mean(pct_rent_spent, na.rm = T))

Next, I created the bar chart by first gathering data into long form.

NYC_all.Summary %>%
  gather(Variable, Value, -year, -TOD) %>%
  ggplot(aes(year, Value, fill = TOD))+
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Variable, scales = "free", ncol = 3)+
  scale_fill_manual(values = pallette2) +
  labs(title = "Indicator differences across time and space") +
  plotTheme()+
  theme(legend.position = "bottom")

TOD Indicator Table

I created a kable table to show the indicator information in table format.

NYC_all.Summary %>%
  unite(year.TOD, year, TOD, sep = ": ", remove = T) %>%
  gather(Variable, Value, -year.TOD) %>%
  mutate(Value = round(Value, 0)) %>%
  spread(year.TOD, Value) %>%
  kbl(Caption = "Indicators per Area",format.args = list(big.mark = ",")) %>%
  kable_classic_2(full_width = F) %>%
  footnote(general_title = "\n",
           general = "Values rounded to nearest whole number")
Variable 2011: Non-TOD 2011: TOD 2020: Non-TOD 2020: TOD
% Work From Home 26 37 47 69
%Income Spent on Rent 19 27 22 28
Avg Commute (mins) 43 40 45 42
Median HH Income 88,806 63,831 87,515 72,675
Meidan Rent 1,376 1,288 1,477 1,503
Total Population 3,536 3,785 3,218 3,663

Values rounded to nearest whole number

Graduated symbols maps

To start, I wrangled the data to create a layer of tract centroids, and then filtered this data set by year and TOD status, and arranged the variable of interest in ascending order for better visibility in the final map.

#we need centroids of our census tracts:
tract_centroids <- st_centroid(NYC_all)

#Filter tract_centroids into two difference dataframes for our two variables of interest (population and rent)

TOD_centroids_pop_2020 <- tract_centroids %>%
  dplyr::filter(TOD == "TOD") %>%
  dplyr::filter(year == 2020) %>%
  arrange(total_pop)

TOD_centroids_rent_2020 <- tract_centroids %>%
  dplyr::filter(TOD == "TOD") %>%
  dplyr::filter(year == 2020) %>%
  arrange(medRent)

Population graduated symbols map

Next, I mapped the tract centroids, and created graduated symbols by setting the size and fill to the population variable.

ggplot()+
  geom_sf(data = boroughs, fill = "grey40")+
  geom_sf(data = TOD_centroids_pop_2020,
          pch = 21,
          aes(size = total_pop, fill = total_pop),
          color = "transparent")+
          #alpha = 0.3)+
          #fill = alpha("red", 0.7),
          #col = "grey20")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  scale_fill_viridis(alpha = 0.4, option = "magma")+
  labs(title = "Population by Tract",
       subtitle = "Tracts within half a mile of the subway",
       caption = "New York, 2020")+
  guides(fill = guide_legend(title = "Population size"), size = "none")+
  mapTheme()+
  theme(plot.title = element_text(size=22))

Rent graduated symbols map

I did the same for median rent. I mapped the tract centroids, and created graduated symbols by setting the size and fill to the median rent variable.

ggplot()+
  geom_sf(data = boroughs, fill = "grey40")+
  geom_sf(data = TOD_centroids_rent_2020,
          pch = 21,
          aes(size = medRent, fill = medRent),
          color = "transparent")+
  geom_sf(data = boroughs, fill = "transparent", color = "white")+
  scale_fill_viridis(alpha = 0.4, option = "magma")+
  labs(title = "Median Rent by Tract",
       subtitle = "Tracts within half a mile of the subway",
       caption = "New York, 2020")+
  guides(fill = guide_legend(title = "Median Rent"), size = "none")+
  mapTheme()+
  theme(plot.title = element_text(size=22))

Plotting Subway Distance vs. Average Rent

I started by wrangling the data: I created a multiple ring plot that created a dataset that indicated the distance of the tracts’ centroids from subway stops (rounded to the nearest half mile). I then mapped the output.

NYC_tracts.rings <-
  st_join(st_centroid(dplyr::select(NYC_tracts, GEOID, year)), 
          multipleRingBuffer(st_union(stations), 47520, 2640)) %>%
  st_drop_geometry() %>%
  left_join(dplyr::select(NYC_tracts, GEOID, medRent, year), 
            by=c("GEOID"="GEOID", "year"="year")) %>%
  st_sf() %>%
  mutate(distance = distance / 5280) #convert to miles

#Commenting out the multipleRingBuffer plot since it takes so much processing time.
  # ggplot() +
  # geom_sf(data=multipleRingBuffer(st_union(stations), 47520, 2640)) +
  # geom_sf(data=st_union(NYC_all.2020), fill=NA, size=1.2) +
  # geom_sf(data=stations, size=1) +
  # labs(title="Half mile buffers") +
  # mapTheme()

Next, I grouped the previous output by year and distance to the subway, and summarized the dataset by the mean median rent.

rent_distance.Summary <-
  st_drop_geometry(NYC_tracts.rings) %>%
  group_by(year, distance) %>%
  summarize(Rent = mean(medRent, na.rm = T))

Distance vs Rent Plot

Finally, I plotted the resulting summary in a line plot.

ggplot(data = rent_distance.Summary, aes(x = distance, y = Rent, group = year))+
  geom_line(aes(color = factor(year)), size = 1.5)+
  geom_point(aes(color = factor(year)), size = 3)+
  scale_color_manual(values = pallette2)+
  labs(title = "Average rent by distance to subway station", x = "Miles to subway station", y = "Average rent", color = "Year")+
  plotTheme()