Monday, May 15, 2017

Determining Sand Mine Suitability With Raster Analysis

Introduction

  The goal of this lab is to use various raster geoprocessing tools to determine the best locations for a sand mine by looking at sand mining suitability and sand mining impact areas within the southern part of Trempealeau county Wisconsin. Model Builder will be used to help explain the process of the raster analysis. Three models will then be used to determine where the best suitable areas are for sand mining in southern Treampealeau county. Then, a fourth model will be used for a scenic horse trail in Trempealeau county called the Eagle View Horse Trail which will be used to see if any highly suitable areas will be within the line of sight of the trail. This trail was chosen because it is one of the main attractions on the Treampealeau county tourism page. The suitability model will take into account the following five criteria:
                                                                     1. Geology Type
                                                                     2. Land Cover
                                                                     3. Distance From Railroads
                                                                     4. Slope
                                                                     5. Water Table Depth
  The impact model will take into account the following five criteria:
                                                                     1. Proximity to Streams
                                                                     2. Erodable Farmland
                                                                     3. Proximity to Populated Areas
                                                                     4. Proximity to Schools
                                                                     5. Proximity to Wildlife Areas

Methods

Model 1: Suitability
  First, the suitability model was created. This can be seen below in figure 6.0. All the models in this project move from the top left to the bottom right. Each step is described following the model. The purpose of this model is to find the suitability of areas for sand mines without taking into consideration environmental and community impacts and risks. Most of the reclassifies will break up the raster into three suitability rankings (1 for low, 2 for medium, and 3 for high). A high suitability ranking means that the land is a better location for a sand mine and a lower suitability ranking means that the land is a worse location for a sand mine. Also, whenever the option was given in a tool, the cell size of a raster was set to 30 x 30. This would ensure that the rasters would be able to be aggregated with Raster Calculator.
Suitability Model
Fig 6.0: Suitability Model
Step 1: Classify the Distance from Rail Terminals
  For this, first the study area (southern Trempealeau county) was created by intersecting the Boundary feature class and the Tremealeau county boundary. Next, a couple of the geoprocessing environments were changed based on the study area. The two that were altered with were Processing Extent, and Raster Analysis. Both environments control what the masked area is of an output raster when a tool is ran.
  After that, an Euclidean Distance was ran on the rail terminals. That output was then masked to the study area. Lastly, a Reclassify was done to classify the distance from the rail terminals. The values entered for the Reclassify are shown below in figure 6.1. These break values were chosen based of the Jenk's Natural Breaks method. The closer the land / raster pixels were to the rail terminal, the higher the suitability ranking it received. This is because it's advantageous for sand mines to be located near rail terminals for transporting sand.
Rail Terminal Reclassify Values
Fig 6.1: Rail Terminal Reclassify Values

Step 2: Determine the Best Geology Type
  First, the geology feature class was converted to raster using the Polygon to Raster tool, then a Reclassify was run so that the Wonewoc and Jordan formations were set to highly suitable and the other geologic formations were set to lowly suitable. The reclassification rankings can be seen below in figure 6.2. The reasons for Wonewoc and Jordan formations being set to highly suitable is because it was told to the author of this post that these formations are highly suitable for sand mining compared to the others.
Geology Reclassify Values
Fig 6.2: Geology Reclassify Values
Step 3: Determine The Slope Suitability
  Next, the Slope tool was ran on the DEM which was used in the Data Gathering lab to get the slope values of each pixel. Because this had a very peppery output, the focal tool Block Statistics was ran to generalize the output. This takes the average of 9 surrounding pixels and generalizes it into 1. After this generalization, the Reclassify tool was ran to classify the slope values. The suitability values can be seen below in figure 6.3. It was determined that the greater the slope for a location, the lower the suitability ranking it would receive and the lower the slope values would receive a high suitability ranking. This is because it is easier to mine sand in flat terrain than it is hilly terrain. The breaks in the slope values were based off of Jenk's Natural Breaks method.
Slope Reclassify Values
Fig 6.3: Slope Reclassify Values
Step 4: Classify the Groundwater Depth
  To do this, a Reclassify was performed on the groundwater depth raster. The values in this raster represent the distance from the ground to the water-table. Information was given to the author that that mining companies prefer to have access to shallower water-tables rather than to deeper ones. Because of this, water-tables which were found to be nearer to the surface received a higher suitability value, and water-tables which were found to the farther from the surface were given a low suitability value. This can be seen below in figure 6.4 in the reclassification values chart.
Groundwater Depth Reclassify Values
Fig 6.4: Groundwater Depth Reclassify Values

Step 5: Classify the Land Cover Types

  In this step, a Reclassify was ran on the land cover raster of Trempeauleau county. The reclassification values can be seen below in figure 6.5. The suitability rankings were decided based on the judgement of the author's interpretation of the descriptions of the land cover classifications which can be found here
Land Cover Reclassify Values
Fig 6.5: Land Cover Reclassify Values
Step 6: Determine Which Land Cover's Aren't Suitable for Mining at All
  This was done by running a Reclassify again on the land cover raster. This time though, the land cover types which were thought to be unsuitable for mining were given a value of 0 and the land cover types which were thought to be suitable for mining were given a value of 1. These values can be seen below in figure 6.6. These suitability values were again based upon the judgment of the author to interpret the  land cover descriptions found here.
Suitable Land Cover or Not Values
Fig 6.6: Suitable Land Cover or Not Values
Step 7: Use Raster Calculator to Aggregate the Rasters
  This consisted of doing two raster calculators. The first one just added the rasters from steps 1 through 5 together. Then, the raster created in step 6 was multiplied with this result of the first raster calculator. This result of this second raster calculator (SuitableLoc) outputted the most suitable areas for sand mining based on the rasters and information given without taking any community or environmental risks and impacts into consideration.


Model 2: Impact and Risk
  This second models performs similar raster analysis as the first model, but this time, the goal is to identify the areas which have the greatest environmental and community impact to sand mining. Once again, reclassification values are based on a 1 to 3 scale. This time though, the values represent the amount of impact an area / pixel has on sand mining. 1 represents low impact, 2 represents medium impact, and 3 represents a high impact. This environmental and community impact / risk model is displayed below in figure 6.7. The model algorithm is explained below in the following steps. Also, whenever the option was given in a tool, the cell size of a raster was set to 30 x 30. This would ensure that the rasters would be able to be added together with Raster Calculator.


 Environmental and Community Impact / Risk Model
Fig 6.7: Environmental and Community Impact / Risk Model

Step 1: Take into Account the Proximity of Streams
  This was done by first figuring which streams were important enough to be considered for analyzing. It was determined that streams had to classified as at least a 3rd order stream for it to be in this analysis. These streams were then exported as a new feature class by selecting by attributes and then exporting the selected attributes. Then, an Euclidean Distance was ran on this feature class. After that, this distance from these at least 3rd order streams were classified using the Reclassify tool. The Jenk's Natural Breaks Method was used to break up the stream distances into 3 classes. These reclassify values can be seen below in the the classification chart in figure 6.8. The reason why it's a higher risk for sand mines to be near streams is because they often have to pay attention to stream habitat and make sure to not affect it. Therefore, if a sand mine were to be created next to a stream, it would have a higher impact on the land / environment.
Stream Distance Reclassify Values
Fig 6.8: Stream Distance Reclassify Values
Step 2: Determine the High Risk Farmland
  For this, the Prime_Farmland attribute table was looked at to see what important attributes could be used to determine good farmland from bad. It was determined that the field which contained values about the erodability of the soil was to be used. This feature class was first turned into a raster with the Polygon to Raster tool. Then, a Reclassify was performed on this field. The values for this can be seen below in figure 6.9. Soil which was classified as highly erodible was given a high risk / impact ranking and not highly erodible land was given a low risk / impact ranking. This is because a sand mine would rather place a mine on stable land rather than land which would erode away as they are mining it. Also, if a mine were created on highly erodable soil, all of the soil would erode away because of the harsh conditions the mine would have on the soil.
Prime Land
Fig 6.9: Prime Land 
Step 3: Create a Noiseshed to Keep Mines Away From Populated Areas
  First, it was decided than the zoning class would be used to determine where the areas of high population are located in the study area. This was done by reading the zoning descriptions and determining which zoning classes have high population. Then, the following zoning classifications where queried out: Residential Public Utilities, Residential - 20 (R-20), Residential - 8 (R-8), Commercial (C), (Incorporated), and Industrial. These were then exported as a new feature class called PopulatedZones to use as the high population areas. Using the zoning feature class has some limitations such as not being able to see exactly where things are, but for the most part, it was good to use this over something like landcover or census data. Zoning classes were a bit more generalized and with just the use a simple query one can find the high population areas.
  It was told to the author that sand mines must be located at least 640 meters from a residential area. To create the noiseshed and buffer away from the populated areas, first the Polygon to Raster tool was ran on the PopulatedZones feature class. Then, the Euclidean Distance was ran this new raster. Lastly, this raster was reclassified with the values below in figure 6.10. High risk area were classified as being within 640 meters of a population area. The value 1280 was chosen because it is double 640. This range was given a medium risk value. Lastly, the far away areas (over 1.28 km) were given a low risk value. These values were assigned like this because of the zoning restrictions and because the closer a sand mine is a populated area, the greater community impact it will have on that area.
 Population Areas Reclassify Values
Fig 6.10: Population Areas Reclassify Values
Step 4: Impact from Schools
  Unfortunately, there was no schools feature class in the geodatabase used for this lab. Instead, the parcels feature class was used to identify which parcels were owned by a school district. Then, these parcels were queried and exported as a new feature class called SchoolDistricParcels. Next, a Euclidean Distance was ran on these parcels. This made the feature class into a raster. Lastly, the Reclassify tool was used to classify the distance from the schools. The values used in this can be seen below in figure 6.11. For the break values, the Jenk's Natural Breaks method was used. High impact values are assigned to areas close to schools while low impact values are assigned to areas far away from schools. This is because the closer a mine is to a school, the greater community impact it has on it.
Schools Impact Reclassify Values
Fig 6.11: Schools Impact Reclassify Values
Step 5: Wildlife Areas Impact
  It was chosen that wildlife areas are at risk to sand mining. To determine the risk value for the wildlife areas, first an Euclidean Distance was ran. This made the wildlife areas a raster, and gave distance values in each pixel. Then, the Reclassify tool was used on this Euclidean Distance raster to classify the distance away wildlife areas. The classification values can be seen below in figure 6.12. Once again, the Jenk's Natural Breaks method was used to determine the break values. It was determined that a sand mine would have a high environmental impact if it were located near (within 15.3 km) of a wildlife area. The farther away from the wildlife area, the less of an environmental impact the mine would have on the wildlife area and vice versa.
Wildlife Areas Distance Reclassify Values
Fig 6.12: Wildlife Areas Distance Reclassify Values
Step 6: Add the Rasters Together
  This consisted of using the Raster Calculator tool to create an algebra expression to add all of the rasters together. This created a new raster which had pixel values ranging from 1 to 15. A pixel value of one represents the areas of lowest community and environmental impact a mine would have on that area, and a pixel value of 15 represents the highest community and environmental impact a mine would have on a certain area.


Model 3: Determine the Best Sand Mine Areas

  This third model takes the output of the suitability model and environmental and community risk model and combines them to create a raster which shows where the best and worst areas for sand mining are located taking into account suitability and environmental and community impact. This model can be seen below in figure 6.13.

Sand Mine Suitability Ranking Model
Fig 6.13: Sand Mine Suitability Ranking Model
  This model subracts the ImpactAreas raster from the SuitableLoc raster using the Raster Calculator tool. To generalize and group the output, the Reclassify tool was then used to rank the suitability of the land for sand mines. For this, the Equal Interval Method was used to classify the pixels values as having a low, medium, or high suitability index. The Jenk's Natural Breaks method was used for the break values. The values used in the reclassification can be seen below in figure 6.14. There are some negative values because of the subtraction used in the algebra expression in the Raster Calculator.
Mine Suitability Reclassify Values
Fig 6.14: Mine Suitability Reclassify Values
Model 4: Viewshed Analysis
Second Part of Viewshed Model
Fig 6.16: Second Part of Viewshed Model
  For this model, a scenic horse trail called the Eagle View Horse Trail was used to see what areas are visible from the horse trail, and what highly suitable sand mining areas are in view of the horse trail. This was done by first running the Viewshed tool with the horse trail and DEM (Meters) of Trempealeau county as the inputs. This outputted a raster which shows which areas are visible from the horse trail. The raster contained two values: Visible and Not Visible. Visible values are the areas which can be seen from the horse trail and Not Visible values are areas which cannot be seen from the horse trail. 
  Next, to see what highly suitable sand mining areas can be seen from the horse trail, a Reclassify was ran on the viewshed raster. A value of 0 was given to not visible areas, and a value of 1 was given to visible areas. Then, a Reclassify was ran on the SuitabilityRanking to create a raster which only had the highly suitable sand mining area pixel values. Lastly, the Raster Calculator was used to multiply the two rasters together. This raster was given the name InViewHighAreas. The first part of the viewshed model can be seen below in figure 6.15, and the second part of the model can be seen on the right in figure 6.16.
 First Part of Viewshed Model
Fig 6.15: First Part of Viewshed Model










Results / Discussion

Suitability Model
  This first map shown below in figure 6.17 includes a series of maps which show the output of all of the reclassifications done in the suitability model. To help visualize the algebra expression used in the Raster Calculator, the expression is shown visually in the series of maps as well. For all the rasters except Suitable Land Cover one, the pixels are broken up into low suitability (1), medium suitability (2), and high suitability (3) values.
Suitability Maps and Expression
Fig 6.17: Suitability Maps and Expression
  This next map displayed below in figure 6.18 is the result of the expression shown above in figure 6.17 and is the SuitableLoc raster from the suitability model. It shows the most suitable locations for a sand mine without taking into account any community or environmental risk factors. High suitable areas are represented by the dark green hue, medium suitable areas are represented by the middle green hue, and low suitable areas are represented by the lightest green hue. Overall, most of the study area is either classified as being medium or highly suitable for sand mining. The areas classified as being lowly suitable are mainly located in the southwest portion of Trempealeau county are because the land cover in these areas isn't suitable for sand mining. Generally, the highly suitable areas are located in the south and eastern portions of the study area. This influence can mainly be attributed to the distance from rail terminals index and the groundwater depth index.
Suitability Index for Sand Mining Locations
Fig 6.18: Suitability Index for Sand Mining Locations
Impact / Risk Model
  The series of maps shown below in figure 6.19 are the reclassifications of the rasters used in the environmental and community risk model. To help visual the algebra expression used in the Raster Calculator in the second model, the simple arithmetic equation is shown visually. The darker the hue of red, the higher the environmental or community impact a sand mine would have on that area. Values of 1 represent low impact, values of 2 represent medium impact, and values of 3 represent high impact from a mine.
Risk Reclassification Maps and Expressoin
Fig 6.19: Risk Reclassification Maps and Expressoin
  The next map, shown below in figure 6.20 is a result of the the expression displayed in the series of maps above in figure 6.19. The highest impact / risk areas are shown in the darkest hues of red, the medium impact / risk areas are shown in the middle hue of red, and the the lowest impact / risk areas are shown in the lightest hue of red. This map doesn't contain as much clustering as the suitability map. Most of the study area is located in a medium impact area. If a sand mining company were to use this map to try to find a site for their sand mine, they would want to try to avoid the high impact areas.
Environmental and Community Impact Index Map
Fig 6.20: Environmental and Community Impact Index Map
Both Models Together
  Next, figure 6.21 is the map of the raster calculated from  Determine the Best Sand Mine Areas model. This raster is the result of subtracting the impact index raster from the suitability index raster and then reclassifying it. This map shows the best locations for sand mining with minimal environmental and community impact in the darkest purple hues. These areas are located mostly in the south and in the northwest part of the study area. Looking at the two source maps in figures 6.20 and 6.18 this makes sense because most of the highly suitable areas in the suitability map in figure 6.18 are located in the west and southern portions of the study area. Also, there are a couple of main areas present of low impact in these regions in the impact map in figure 6.20. If a sand mining company wanted to take away one thing from this lab, this is the map they would look at. They would be able to identify the prime areas for sand mining and the areas they should avoid.
Best Locations for Sand Mining With Minimal Environmental and Community Impact
Fig 6.21: Best Locations for Sand Mining With Minimal Environmental and Community Impact
 Viewshed Map
   Lastly, a viewshed map was created. This is displayed below in figure 6.22. The two maps reflect the output of the two parts of the viewshed analysis model There are two maps in the figure. The one on the left shows the location of the eagle view horse trail along with the visible high suitable locations for sand mining with minimal environmental and community impact from the horse trail. This is the result of the second part of the model. In general, a good chunk of visible high suitable area can be seen from the trail. However, the trail is fairly small, so only the northwestern part of the study area is affected by this. If a mine were to go up in view of the horse trail, it would most likely be in one of these locations. The map on the right shows all the areas which are visible and not visible from the eagle view horse trail within the study area. The visible areas are shown in green, and the not visible areas are shown in pink. This map could be used to fight back against the creation of a sand mine if one didn't want the scenic view from the horse trail to be affected by a sand mine.
Viewshed Map
Fig 6.22: Viewshed Map


Conclusion

  Raster analysis is a good way to analyze information which is available in raster and not available in vector format. The process of reclassifying rasters is important as it is a way of standardizing the rasters so they can be used in an algebra expression with the Raster Calculator tool. There are probably many different factors that could be used to expand upon this project. However, the process would be very much the same. If this were to be done for a job, the break values used in the relcassifications would be given much more thought than they were in this lab. Using the Jenk's Natural Breaks method was okay, but there are probably more meaningful break points which could be found by doing a little more research.
  The results of this lab are broken up so that if one wanted to only look at the influence of certain variables / rasters, they could. A sand mining company could use the maps in this lab to see where the best spots are to create a sand mine. If the scope of this project were to increase in size, the amount of time spent on this project would dramatically increase because the raster datasets would take a long time to process when running some of the raster tools. Therefore, if one wanted to do this for all of west central Wisconsin, one should do so on a county by county basis.

Sources

Trempealeau County Tourism Website
  http://www.tremplocounty.com/tchome/misc/tourism.aspx
Tremealeau County, Geodatabase
  http://www.tremplocounty.com/tchome/landrecords/
Multi Resolution Land Characteristics, Landcover
  http://www.mrlc.gov/
USGS, DEM
  https://nationalmap.gov/index.html
USGS, Land Cover Descriptions
  https://www.mrlc.gov/nlcd01_leg.php

Thursday, April 20, 2017

Network Analysis of Frac Sand Mines

Introduction

  As discussed in lab 1, there are several issues associated with frac sand mining. This lab will focus on the increased traffic issue. Most roads in western Wisconsin were built for rural, low-freight economies and must now accommodate high volumes of heavy trucks (Hart 8-9). This is because there is an lack of pipelines and rail terminals which leads to an increase number of trucks transporting sand on the roads. The objective of this lab is to calculate a hypothetical dollar amount by county based on the wear and tear the sand trucks put on the roads between their route from the mine to the nearest rail terminal. This will then displayed and discussed using maps, charts, and tables. Because not all sand mines cause wear and tear on the road, there are three criterion which a mine must meet to be included in this analysis:
                                                           1. The mine must be active
                     2. The mine must not have a rail loading station on-site
              3. The mine must not be within 1.5 km of a rail line
  The mines which met this criterion were queried out by creating a python script which can be found in the python script post.

Methods

  Model builder in ArcMap was used to keep track of the workflow for this project. This is shown below in figure 4.0. The model starts in the upper left and then snakes its way around to the bottom. Each chunk of steps is described following the model.

Fig 4.0: Model Builder Flow
Fig 4.0: Model Builder Flow

Step 1: Determine Which Rail Terminal Each Mine Will Travel To
  First, The Make Closest Facility Layer tool was used to create a network analysis layer which can be used to calculate a constraint such as time or distance. In this case, the constraint is set to time. Then, the mines which met the above criterion in the introduction and the rail terminals were added  to the streets network layer so they could be used for network analysis. If there were any barriers such as road closures, this is where they would have been added. Then, the solve tool was used to find the shortest route based on time from the mines to the rail terminals using the streets. These routes were then selected and exported as a new line feature class called Export_Routes.

Step 2: Calculate the Length of the Route by County
  Next, the WisconsinCounties feature class was intersected with the Export_Routes. Using the Intersect tool keeps the attributes of both feature classes which will be necessary for the road cost calculation later. Then, because the routes weren't projected they were projected using the Project tool to a state plane Wisconsin coordinate system which has a linear unit of feet. This creates a default field which gives the road length used by the routes for all counties in Wisconsin. Because not all counties in Wisconsin have a route, the Summary Statistics tool was used to organize the data so that the road length by county and county name can be displayed. This was based on the default road length field and county name field. Next, a new field was created to display the road length in miles. The road miles by county were calculated by multiplying the default road length field by 5,280.

Step 3: Calculate the Cost of the Route by County
  A new field was created titled CountyCostInDollars. The Calculate Field tool was then  used to calculate the road cost. The cost is based off of a hypothetical assumption that for each sand mine there are 50 trucks trips to and from the rail terminal each year, and that the cost incurred by the county for using the roads is 2.2 cents per mile. Using these inputs, the calculation used to determine the cost of the trucks on the roads is the road length multiplied by 100 and then multiplied by .022. This can be seen below in figure 4.1.
Fig 4.1: Calculated Road Cost by County per Year
Fig 4.1: Calculated Road Cost by County per Year

Step 4: Get the Data Ready to Map the Cost by County
  The summarized table was then joined to the Wisconsin Counties feature class so that it could be mapped. The common key used was county name. This was then exported as a new feature class.

Results / Discussion

  A chart was created in Excel using the Table to Excel tool. In Excel, the table was simplified so that only the important fields were displayed. The counties are listed in alphabetical order.
Fig 4.2: Excel Chart
Fig 4.2: Excel Chart
  Then, based off this table, some basic statistics were calculated. This is shown below in figure 4.3. There is a very large variance in the cost based on the  large $ 168.02 standard deviation value. This is because Burnett, St. Croix, and Winnebego counties had only had a very small section of a route crossing through them while Barron, Chippewa, and Eau Claire counties had large sections of routes passing through them.
Fig 4.3: Route Length and Cost Statistics

  Next, from the Excel table, a double bar graph was created to show the cost in dollars and road length in miles by county. This is shown below in figure 4.3. There appears to be a strong correlation between the two variables. This is because the road cost is based off the road length. By far, Chippewa county had the largest incurred cost which is $ 615.33. Winnebego county had the lowest incurred cost which is $ 1.88. Barron, Chippewa, Eau Claire, Jackson, Trempealeau, and Wood counties all had an incurred cost greater that $ 200. The rest of the counties had an incurred road cost less than $ 200.
Fig 4.4: Road Length and Cost Chart
Fig 4.3: Road Length and Cost Chart

  Lastly, a map was made to show the routes, the sand mines, the rail terminals, the main roads, and the incurred road cost by county. The three counties which had the highest incurred road cost are all located next to each other. Interesting enough, there is only one mine located in Eau Claire county. However, there is a rail terminal there which is the destination for 8 different mines stretching from Jackson to St. Croix county. Compare this to Burnett county, where there is only one mine, and the route in the county totals only .856 miles. The rest of the route extends into Minnesota.
   
Fig 4.3: Road Route and Cost Map
Fig 4.4: Road Route and Cost Map
  Based off this map, the most most common rail terminals and the routes which the sand trucks take can be seen. The three most used rail terminals are located in Chippewa, Eau Claire, and Trempealeau counties. This rail terminals should expect a bit more traffic, even if the trucks aren't traveling quite as far as to some other terminals to get there.

Conclusion

  Sand trucks have a large impact on the roads in Wisconsin counties. Although, the variables in the calculated road cost equation is hypothetical, the data still provides useful information such as the projected routes sand trucks take from mine to rail terminals which could be used by local governments.
  It is important to note that some of the road types used in the routes range from county dirt roads to interstates. The interstates are more equipped to handle the increased traffic than the smaller county roads are. If this project was going to be done over again, perhaps the cost incurred by the county should vary depending on the road type.

Sources

Hart, M.V., Adams, T., & Schwartz, A. (2013). Transportation Impacts of Frac Sand Mining in the MAFC Region: Chippewa County Case Study. White Paper Series: 2013, 1-55

Friday, April 7, 2017

Geocoding Sand Mines

Introduction
  The objectives of this lab are to geocode sand mines using ArcMap, normalize Excel data, utilize the Public Land Survey System (PLSS) in the geocoding process, and prepare a series of maps comparing geocoded locations of the sand mines. 

Methods

What is Geocoding?
  Geocoding is the means of taking an address, coordinate, or name of a location and tying it to a location on the earth's surface. 

Normalizing Excel Data and Geocoding Sand Mines
Non-Standardized Excel Data
Fig 4.0: Non-Standardized Excel Data
  When first opened, the Excel document wasn't normalized. Figure 4.0 shows some attributes which needed to be normalized before geocoding. The Address field will need to be normalized. The address needs to be broken up into address number, street name, street type, city, state, and so forth. The address locator in ArcMap isn't able to recognize these attributes when they are grouped into a single field. Figure 4.1 shows part of the standardized Excel table. Notice how the address is separated into many different fields.
Standardized Excel Data
Fig 4.1: Standardized Excel Data
  Next, the mine locations were geocoded using the geocoding toolbar. This matched 9 of the 19 mines as seen in figure 4.2 below. Only 9 mines were matched because there were two different types of addresses in the Excel document for the mines. Nine mines had regular address, and were matched, but 10 had PLSS address and couldn't be matched because the address type isn't compatible. These 10 PLSS address will have to be geocoded manually.
Fig 4.2: Geocoding Match Percentage
  Even though 9 addresses matched, many of these were in the wrong location and needed to be changed. Often, the geocoding placed the mine in the center of the nearest city or in the centroid of a county. This is because the actual mine address was in the PLSS address and there is no regular address for the mine. Because these geocoded mine locations will be used for truck routing analysis in a future lab, the address point of all the mines were placed on the edge of the road where the trucks are leaving and entering the mine. This means that all of the matched mine locations had to be changed. The matched mines were easier to find than the unmatched ones, but sometimes the mine didn't geocode anywhere near the right location. In this case, the PLSS address was used, or if this wasn't given, a google search of the mine operator and the name of the mine was done to locate the correct location.
  To geocode and change the mine locations using the PLSS addresses, PLSS townships, range, sections, and section quarters layers and labels were added to the map to use as a reference. Then, when the mine location was found, the corresponding mine was highlighted in the geocoding rematch window and the Pick Address from Map button was used to pick an address location. This can be seen below in figure 4.3. All 19 mines had to be matched or rematched using this process. 
Fig 4.3: Geocoding Window
Fig 4.3: Geocoding Window

Determining Distance Between My Geocoded Sand Mines, the Truth Locations, and Collegues Locations
  A merge was used to collate colleague shapefiles. Some issues that came up when doing the merge include that some people renamed the Mine_Uniqu field, some people changed data types, some people changed the Mine_Uniqu IDs, and many people didn't complete the geocoding. Ideally, there were supposed to be 3 colleague mine locations which could be compared to mine. However, only about two thirds of the class completed it, and about half of the people finished who geocoded the same mines as me. To get around the issue of when colleagues changed the Mine_Uniqu field, a field map was used. Luckily, only one colleague changed the data type of a field. A copy of the shapefile was made, and the fields were corrected so they could be merged.
  After merging everyone's shapefiles, a query statement was used to only select the mines to be compared. This query statement is shown below in figure 4.4. Because this query statement was going to be used again later for the truth mines, the text was saved in a .txt file.
Query Statement Used to Select Only the Mines Used for Comparing Distances
Fig 4.4: Query Statement Used to Select Only the Mines Used for Comparing Distances
  These selected mines were then exported as a new shapefile so only the mines needing to be compared will be shown on the map. Then, my shapefile, and the exported shapefile was reprojected to the same spatial reference. This would make sure that distance can be measured accurately. To get the distance between the geocoded mine locations the measure tool was used. This value was then entered into an Excel spreadsheet. Because there were only 27 mines which matched, all of them were used to compare distances. This computes to an average of 1.4 colleague mines per 1 of my mines. Then, the minimum, maximum, standard deviation, and average were calculated based off this distance field. This process of merging, querying, and measuring was then repeated with the truth mine locations.

Results

  Figure 4.5 shows what the Excel spreadsheet looked like after all of the measuring was complete. The mine unique ID is shown in the leftmost column and then the distances are placed in the columns to the right. The first three distance columns are the distances to where colleagues placed the mine. The distance value in the truth column represents the distance from the actual mine. The statistics in the lower left apply to the distance 1,2, and 3 fields, and the statistics in the lower right were derived only the distances in the truth field.
Fig 4.5: Sand Mine Distance Data
  Every statistic was lower when comparing the distance from my mine placement to where other colleagues placed it than when comparing it to the truth location. This is most likely because the same method was used to locate the mines and the point was placed next to the road where trucks enter and leave the mine. The average distance was 3,930 meters less and the standard deviation was 6,273 meters less between the comparison of the colleague mines and comparison of the truth mines.
  Figure 4.6 is a map  which shows the placement of colleague geocoded mines and compares it to my geocoded mines. Most mines were very placed fairly close to each other. However, mine 296 in Trempealeu county was place quite a bit differently which can be seen on the map, and in figure 4.5.
Comparing Geocoded Sane Mines to Classmates Geocoded Mines
Fig 4.6: Comparing Geocoded Sane Mines to Classmates Geocoded Mines

  This next map, shown in figure 4.7, displays the placement of the actual mine locations and compares it with where I placed them. The area where there is the most error is located south of Chetek in Barron and Chippewa counties. Most of the mines in this area appear to be several 1000's of meters off. However, looking at the placement of mine 215 in the map in figure 4.6 and comparing it to 4.7, it appears that my colleagues placed the mine in the same wrong location as well. Although this is possible, looking at the clustering in the colleague comparison map it is more likely that the wrong address was provided which led to the wrong mine being picked. This could also be true for mines 209, 230 and 269.
Comparing Geocoded Sand Mines to the Truth Location
Fig 4.7: Comparing Geocoded Sand Mines to the Truth Location
  An example error map, shown below in figure 4.8, was created to show the difference between the placement of mine 284 located in Monroe county. The truth mine location placed the mine at one of the buildings on the mine site, likely the mine's office, a colleague placed the mine on the road entering the mine, and I placed the mine at the other road entrance. This error occurred because people have different judgement when deciding where the main entrance is, and because the truth location isn't placed at the main entrance. This map shows the most common error/difference when geocoding the mines, but sometimes, the mine was placed in the wrong location by several hundred meters as indicated in the table in figure 4.5.  
Example of Error When Geocoding Mines
Fig 4.8: Example of Error When Geocoding Mines

Discussion

  Both inherent errors and operational errors were present in this lab. Most of the errors which showed in the maps were operational. Operational error "occurs as the result of the imperfection  (both mechanical and procedural) of the instruments and methods used for geographic data collection, management, and application" (Lo 108). This is exactly why there are differences between mine, the truth's, and colleague's mine locations. People manage data in different ways and will apply it differently. An example is when choosing which entrance to put the mine location at such as in figure 4.8. Another way operational error was present was through the changing attribute information such as changing the name of a field. Inherent error was present, but it didn't show quite as much in the maps and the Excel spreadsheet because they were more hidden. Inherent error occurs "as a result of the limitations of the instruments and techniques for obtaining measurements with absolute accuracy, as well as the inability of the computer to represent coordinates with absolute precision" (Lo 108). This is present by classmates choosing different map projections for their shapefiles and is present when the DNR recorded in the attribute data. It's possible that some of the attribute data is incorrect.
  The only way to know for sure where the mine locations are is to either contact the mine company and have them send you a map of their mine or go visit the sand mine in person. Even though the mine locations provided by the WI DNR are referred to as the truth, there is still a chance that they have the wrong mine location.

Conclusion

  In this lab, the importance of standardizing data was learned. There was a lot of time spent trying to figure out the differences in the shapefiles and getting them merged. If greater emphasis was put on having the same attribute names and data types, a considerable amount of time could have been saved. Also, an solid understanding of the PLSS and the importance of geocoding has been established through this lab. Learning to use the merge tool is important because in the real world, potentially multiple people would be collecting data and saving their work as a shapefile. This tool would help to expedite the process of collating the data.  

Sources

Lo, Ch4 Data Quality and Data Standards PDF, 103 - 134

Tuesday, March 14, 2017

Data Gathering

Goal and Objectives

  The goals of this lab are to gather data from various sources online for Trempealeau county, to analyze the accuracy of this data, to create a loop using python to clip, project, and then place them in the Trempealeau geodatabase, and to create a series of maps with the data. There will be an emphasis put on the importance of metadata. 

Methods

Download the Data
 First, the raillines shapefile was downloaded from the US Department of Transpotation's website linked here. Next, the USGS National Map Viewer linked here was used to download the national landcover raster of Wisconsin along with the two DEM tiles which encompass Trempealeau county. Next, the UASDA Geospatial Data Gateway website, linked here, was used to download the crop cover raster. Then, the Trempealeau county website, linked here, was used to download the Trempealeau county geodatabase. Lastly, the soil data was downloaded from the Web Soil's Survey website, linked here.

Import the SSURGO Data
  Next, some of the files from the soils data were saved as a table format from a very old personal geodatabase and needed to imported by using Microsoft Access. A macro was used to import the data from these tables into the geodatabase, by setting the correct output location. After the tables imported, the soils shapefile was imported separately using the import feature in ArcCatalog. A relationship class was then created between the soils feature class and the output tables in the Trempealeau geodatabase. This was then used to create a join between the tables and soil feature class.

Use Python to Clip, Project, and Extract Rasters
  After all the data was downloaded, a python script was created which can be found in the Python Scripts post. This script clipped all of the rasters (DEM, Landcover, and Cropcover) to Trempealeau county, projected the rasters to the same coordinate system as the Trempealeau geodatabase, and placed the rasters in the Trempealeau geodatabase.

Data Accuracy
  The table shown below in figure 3.0 shows the meta data collected from the data sets above. The meta collected very tediously by looking through multiple .txt files and the data providers websites. Some of the meta data couldn't be located in these .txt or websites and are appropriately marked as N/A below. This meta data refers to the level of accuracy which the data was collected at. 
Meta Data →
Data Set
Scale
Effective Resolution
Minimum Mapping Unit
Planimetric Coordinate Accuracy
Lineage
Temporal Accuracy
Attribute Accuracy
Soils
1 : 12,000
6 m
6 m
N/A
Web Soils Survey
2015
Tested against a master set of valid attributes
Landcover
1: 60,000
30 m
30m
N/A
Used two-date pairs of landsate scenes from 2006 and 2011
2011
85%- 90%
Crop cover
1: 100,000
30 m
30 m
N/A
USDA/NRCS
2006
N/A
DEM
1 : 22,000
11 m
11 m
N/A
USGS
2013
N/A
Trempealeau
Geodatabase
N/A
.01 cm
N/A
N/A
Trempealeau County
2007
N/A
Department of Transportation
1:24,000
to
1:100,000
12 m
12 m
N/A
Rederal Railroad Administration
2014
N/A
Fig 3.0: Select Meta Data for Downloaded Data Sets


Results

  A series of maps shown below in figure 3.1 was created using the data downloaded to show some of the features in Trempealeau county. These include the elevation, crop cover, railroads, and landcover. Looking at the crop cover map, there appear to be many corn fields near the streams and rivers. The majority of the crop cover not near streams and rivers is deciduous forest. Looking at the elevation and rail roads map, the rail roads align very nicely with the low elevation. This is also near the main streams in the county. In the landcover map, there are three types of landcover which stand out the most: deciduous forest, cultivated crops, and hay/pasture. It is important to note that the deciduous trees class in both the crop cover map and the landcover map align almost perfectly.
Fig 3.1: Series of Maps Clipped to Trempealeau County

Conclusion

  In conclusion, the data downloaded in this lab was put to good use through using python, and by creating a series of maps. Metadata is an important part of data, without it, the data would not be credible. The meta data can be used for reference so the data being mapped can be accurately placed on the map. Learning how to download data off of the internet is a useful skill to have. In the workplace, there will be no data given like there often is in school. More than likely, the data will have to be downloaded from online, just like in this lab, and then be manipulated to the desired output. One thing that is concerning about the metadata, was how difficult it is to find. For these data sets, there were .txt files, but often they didn't contain the meta data needed to fill in the chart. If meta data were to be organized better, then it would become easier to find therefore helping the user assess the quality of the data. 

Sources

Geospatial Data Gateway, NASS
Tremealeau County, Geodatabase
Multi Resolution Land Characteristics, Landcover
Web Soils Survey, soils
USGS, DEM