Friday, April 7, 2017

Geocoding Sand Mines

Introduction
  The objectives of this lab are to geocode sand mines using ArcMap, normalize Excel data, utilize the Public Land Survey System (PLSS) in the geocoding process, and prepare a series of maps comparing geocoded locations of the sand mines. 

Methods

What is Geocoding?
  Geocoding is the means of taking an address, coordinate, or name of a location and tying it to a location on the earth's surface. 

Normalizing Excel Data and Geocoding Sand Mines
Non-Standardized Excel Data
Fig 4.0: Non-Standardized Excel Data
  When first opened, the Excel document wasn't normalized. Figure 4.0 shows some attributes which needed to be normalized before geocoding. The Address field will need to be normalized. The address needs to be broken up into address number, street name, street type, city, state, and so forth. The address locator in ArcMap isn't able to recognize these attributes when they are grouped into a single field. Figure 4.1 shows part of the standardized Excel table. Notice how the address is separated into many different fields.
Standardized Excel Data
Fig 4.1: Standardized Excel Data
  Next, the mine locations were geocoded using the geocoding toolbar. This matched 9 of the 19 mines as seen in figure 4.2 below. Only 9 mines were matched because there were two different types of addresses in the Excel document for the mines. Nine mines had regular address, and were matched, but 10 had PLSS address and couldn't be matched because the address type isn't compatible. These 10 PLSS address will have to be geocoded manually.
Fig 4.2: Geocoding Match Percentage
  Even though 9 addresses matched, many of these were in the wrong location and needed to be changed. Often, the geocoding placed the mine in the center of the nearest city or in the centroid of a county. This is because the actual mine address was in the PLSS address and there is no regular address for the mine. Because these geocoded mine locations will be used for truck routing analysis in a future lab, the address point of all the mines were placed on the edge of the road where the trucks are leaving and entering the mine. This means that all of the matched mine locations had to be changed. The matched mines were easier to find than the unmatched ones, but sometimes the mine didn't geocode anywhere near the right location. In this case, the PLSS address was used, or if this wasn't given, a google search of the mine operator and the name of the mine was done to locate the correct location.
  To geocode and change the mine locations using the PLSS addresses, PLSS townships, range, sections, and section quarters layers and labels were added to the map to use as a reference. Then, when the mine location was found, the corresponding mine was highlighted in the geocoding rematch window and the Pick Address from Map button was used to pick an address location. This can be seen below in figure 4.3. All 19 mines had to be matched or rematched using this process. 
Fig 4.3: Geocoding Window
Fig 4.3: Geocoding Window

Determining Distance Between My Geocoded Sand Mines, the Truth Locations, and Collegues Locations
  A merge was used to collate colleague shapefiles. Some issues that came up when doing the merge include that some people renamed the Mine_Uniqu field, some people changed data types, some people changed the Mine_Uniqu IDs, and many people didn't complete the geocoding. Ideally, there were supposed to be 3 colleague mine locations which could be compared to mine. However, only about two thirds of the class completed it, and about half of the people finished who geocoded the same mines as me. To get around the issue of when colleagues changed the Mine_Uniqu field, a field map was used. Luckily, only one colleague changed the data type of a field. A copy of the shapefile was made, and the fields were corrected so they could be merged.
  After merging everyone's shapefiles, a query statement was used to only select the mines to be compared. This query statement is shown below in figure 4.4. Because this query statement was going to be used again later for the truth mines, the text was saved in a .txt file.
Query Statement Used to Select Only the Mines Used for Comparing Distances
Fig 4.4: Query Statement Used to Select Only the Mines Used for Comparing Distances
  These selected mines were then exported as a new shapefile so only the mines needing to be compared will be shown on the map. Then, my shapefile, and the exported shapefile was reprojected to the same spatial reference. This would make sure that distance can be measured accurately. To get the distance between the geocoded mine locations the measure tool was used. This value was then entered into an Excel spreadsheet. Because there were only 27 mines which matched, all of them were used to compare distances. This computes to an average of 1.4 colleague mines per 1 of my mines. Then, the minimum, maximum, standard deviation, and average were calculated based off this distance field. This process of merging, querying, and measuring was then repeated with the truth mine locations.

Results

  Figure 4.5 shows what the Excel spreadsheet looked like after all of the measuring was complete. The mine unique ID is shown in the leftmost column and then the distances are placed in the columns to the right. The first three distance columns are the distances to where colleagues placed the mine. The distance value in the truth column represents the distance from the actual mine. The statistics in the lower left apply to the distance 1,2, and 3 fields, and the statistics in the lower right were derived only the distances in the truth field.
Fig 4.5: Sand Mine Distance Data
  Every statistic was lower when comparing the distance from my mine placement to where other colleagues placed it than when comparing it to the truth location. This is most likely because the same method was used to locate the mines and the point was placed next to the road where trucks enter and leave the mine. The average distance was 3,930 meters less and the standard deviation was 6,273 meters less between the comparison of the colleague mines and comparison of the truth mines.
  Figure 4.6 is a map  which shows the placement of colleague geocoded mines and compares it to my geocoded mines. Most mines were very placed fairly close to each other. However, mine 296 in Trempealeu county was place quite a bit differently which can be seen on the map, and in figure 4.5.
Comparing Geocoded Sane Mines to Classmates Geocoded Mines
Fig 4.6: Comparing Geocoded Sane Mines to Classmates Geocoded Mines

  This next map, shown in figure 4.7, displays the placement of the actual mine locations and compares it with where I placed them. The area where there is the most error is located south of Chetek in Barron and Chippewa counties. Most of the mines in this area appear to be several 1000's of meters off. However, looking at the placement of mine 215 in the map in figure 4.6 and comparing it to 4.7, it appears that my colleagues placed the mine in the same wrong location as well. Although this is possible, looking at the clustering in the colleague comparison map it is more likely that the wrong address was provided which led to the wrong mine being picked. This could also be true for mines 209, 230 and 269.
Comparing Geocoded Sand Mines to the Truth Location
Fig 4.7: Comparing Geocoded Sand Mines to the Truth Location
  An example error map, shown below in figure 4.8, was created to show the difference between the placement of mine 284 located in Monroe county. The truth mine location placed the mine at one of the buildings on the mine site, likely the mine's office, a colleague placed the mine on the road entering the mine, and I placed the mine at the other road entrance. This error occurred because people have different judgement when deciding where the main entrance is, and because the truth location isn't placed at the main entrance. This map shows the most common error/difference when geocoding the mines, but sometimes, the mine was placed in the wrong location by several hundred meters as indicated in the table in figure 4.5.  
Example of Error When Geocoding Mines
Fig 4.8: Example of Error When Geocoding Mines

Discussion

  Both inherent errors and operational errors were present in this lab. Most of the errors which showed in the maps were operational. Operational error "occurs as the result of the imperfection  (both mechanical and procedural) of the instruments and methods used for geographic data collection, management, and application" (Lo 108). This is exactly why there are differences between mine, the truth's, and colleague's mine locations. People manage data in different ways and will apply it differently. An example is when choosing which entrance to put the mine location at such as in figure 4.8. Another way operational error was present was through the changing attribute information such as changing the name of a field. Inherent error was present, but it didn't show quite as much in the maps and the Excel spreadsheet because they were more hidden. Inherent error occurs "as a result of the limitations of the instruments and techniques for obtaining measurements with absolute accuracy, as well as the inability of the computer to represent coordinates with absolute precision" (Lo 108). This is present by classmates choosing different map projections for their shapefiles and is present when the DNR recorded in the attribute data. It's possible that some of the attribute data is incorrect.
  The only way to know for sure where the mine locations are is to either contact the mine company and have them send you a map of their mine or go visit the sand mine in person. Even though the mine locations provided by the WI DNR are referred to as the truth, there is still a chance that they have the wrong mine location.

Conclusion

  In this lab, the importance of standardizing data was learned. There was a lot of time spent trying to figure out the differences in the shapefiles and getting them merged. If greater emphasis was put on having the same attribute names and data types, a considerable amount of time could have been saved. Also, an solid understanding of the PLSS and the importance of geocoding has been established through this lab. Learning to use the merge tool is important because in the real world, potentially multiple people would be collecting data and saving their work as a shapefile. This tool would help to expedite the process of collating the data.  

Sources

Lo, Ch4 Data Quality and Data Standards PDF, 103 - 134

No comments:

Post a Comment