Posts Tagged ‘dbf’

Finding Duplicate Points in a Shapefile

Wednesday, September 2nd, 2009

[Editor’s note: When building the 6,600 cities for Natural Earth vector, we had 6 extra townspots than town labels. Bound to happen on larger projects. One could take the halving approach and select half, see if the number of symbols matches the number of text objects, if so skip, if not subdivide in 1/2 again and reevaluate. Or if you use MaPublisher with Illustrator and/or Vectorworks to export out as a SHP file, we can open the DBF up in Excel and use the “countif” function and “conditional formatting” to quickly identify the exact features to resolve. By sorting the resulting “true” and “false” columns on lat, long, and feature name, we can quickly evaluate if there are multiple features at the same geographic location and compare their names. If they are the same name, assume 1 is a duplicate and remove it.]

Republished from Microsoft.

You can locate duplicates in a range of data by using conditional formatting and the COUNTIF function. Here are the details on how to make that work.

Set up the first conditional formatting formula

I’ll start by setting up a conditional format for the first data cell. Later, I’ll copy that conditional format for the whole range.

In my example, cell A1 contains a column heading (Invoice), so I will select cell A2, and then click Conditional Formatting on the Format menu. The Conditional Formatting dialog box opens. The first box contains the text, Cell Value Is. If you click the arrow next to this box, you can choose Formula Is.

Example

After you click Formula Is, the dialog box changes appearance. Instead of boxes for between x and y, there is now a single formula box. This formula box is incredibly powerful. You can use it to enter any formula that you can dream up, as long as that formula will evaluate to TRUE or FALSE.

In this case, we need to use a COUNTIF formula. The formula to type in the box is:


=COUNTIF(A:A,A2)>1

This formula says: Look through the entire range of column A. Count how many cells in that range have the same value as cell A2. Then, compare to see if that count is greater than 1.

When there are no duplicates, the count will always be 1; because cell A2 is in the range, we should find exactly one cell in column A that contains the same value as A2.

Note In this formula, A2 represents the current cell — that is, the cell for which you are setting up the conditional format. So, if your data is in column E and you are setting up the first conditional format in cell E5, the formula would be =COUNTIF(E:E,E5)>1.

Choose a color to highlight duplicated entries

Now it is time to select an obnoxious (that is, obvious) format to identify any duplicates that are found. In the Conditional Formatting dialog box, click the Format button.

Example

Click the Patterns tab and click a bright color swatch, like red or yellow. Then click OK to close the Format Cells dialog box.

Example

You will see the selected format in the preview box. Click OK to close the Conditional Formatting dialog box, and…

Example

Nothing happens. Wow. If this is your first time setting up conditional formatting, it would be really nice to get some feedback here that it worked. But, unless you are lucky enough that the data in cell A2 is a duplicate of the data in some other cell, the condition is FALSE and no formatting is applied.

Continue reading at Microsoft . . .

Simple shapefile drawing in ActionScript 3 (Cartogrammar)

Wednesday, July 29th, 2009

Shapefile + magic = map in Flash!

[Editor’s note: Andy Woodruff explains how to use his quick and easy implementation of Edwin van Rijkom’s AS3 classes for loading SHP files and their DBF attributes into Flash/Flex. This library DOES NOT PROJECT your shp files, you might consider doing that first.]

Recently I’ve heard two friends independently inquire about some sort of basic guide for loading and drawing a shapefile in Flash. The only real tutorial/example I can recall is here, dealing with Google Maps. But these guys are looking for something more bare-bones. Being a regular user of Edwin van Rijkom’s invaluable code libraries for reading shapefiles, and usually forgetting the process myself, I thought it would be a good idea to put together a very simple set of AS3 classes that load a shapefile and throw a map on screen.

So to get those jerks off my back, I wrote a little thing called ShpMap, which supplements van Rijkom’s classes by loading and drawing a shapefile. It’s nothing fancier than that. Sometimes all you need is to get your base map on screen. (Update: just to round it out a little more, I’ve added basic loading and parsing of a shapefile’s accompanying DBF file, which contains attribute data. This also uses classes by van Rijkom.)

I hope that this class (and the several associated classes) can both be directly usable for some projects and serve as a basic guide to using van Rijkom’s classes to load shapefiles.

Dig it:

  • An example that loads and displays a US states shapefile (and then puts a square on my house and colors the state of Wisconsin green). View the source code here.
  • Download the source code. (My classes plus van Rijkom’s, as well as a demo US States shapefile.)

Noncontiguous Area Cartograms (IndieMaps)

Monday, March 2nd, 2009

[Editor’s note: Zach Johnson promotes his Actionscript 3 class for producing non-continuous cartograms and gives background on why these are better (and easier to construct) than Gaster-Newman continuous cartograms.]

Excerpted from IndieMaps blog by Zach Johnson.
View full blog post from Dec. 4, 2008.

Fully contiguous cartograms have stretched and distorted borders but perfectly maintained topologies. Like the Gastner-Newman diffusion-based cartograms we see all over the place. Though all sorts of cartogram designs have been produced, those with perfect topology preservation (fully contiguous cartograms) receive the majority of academic and popular press attention.

< snip >

Judy Olson (Wisconsin Geography alum natch) wrote the only academic article to focus specifically on this cartogram symbology in 1976. She believed noncontiguous cartograms held three potential advantages over contiguous cartograms (I’ve three more below):

  1. “the empty areas, or gaps, between observation units are meaningful representations of discrepancies of values, these discrepancies generally being a major reason for constructing a cartogram”
  2. production of noncontiguous cartograms involves “only the discrete units for which information is available and only the lines which can be accurately relocated on the original map appear on the noncontiguous cartogram”
  3. because of perfect shape preservation, “recognition of the units represented is relatively uncomplicated for the reader”

Despite these inherent advantages (along with ease of production), all the early value-by-area cartograms I’ve seen maintain contiguity. Some took the radical step of abstracting features to geometric primitives, like Levasseur’s early French examples (which may not have been cartograms) and Erwin Raisz’s early American “rectangular statistical cartograms”. But in many ways the noncontiguous design is the more radical cartogram, as it actually breaks the basemap apart — rather than skewing shared borders it abandons them.

my [his] AS3 classes

Olson outlines a technique — the projector method — for manually producing such cartograms. A projector capable of precise numeric reduction/enlargement was required, but not much else, and accurate cartograms could be produced in minutes. A scaling factor was calculated for each enumeration unit, the projector was set to this value, and the projected borders were traced, keeping units centered on their original centers.

My [his] AS3 NoncontiguousCartogram class works similarly. It takes an array of objects containing geometry and attribute properties and creates a noncontiguous cartogram. I include methods for creating the input array from a shapefile/dbf combo, but using KML, WKT, or geoJSON representations wouldn’t be too hard. Methods are included for projecting this lat/long linework (to Lambert’s Conformal Conic projection at least). The NoncontiguousCartogram class draws the input geography, figures the area of each feature, and scales figures according to their density in the chosen thematic variable.

It’s all good/in ActionScript 3, so can be used in Flash or Flex. The zip distribution includes the following:

  • the main NoncontiguousCartogram.as class
  • two example applications and the data needed to run them
  • utility classes, including some that make creating cartograms from shp/dbf input quite easy
  • Edwin Van Rijkom’s SHP and DBF libraries, which are used to load the shapefiles in both of the included examples
  • Keith Peters’ MinimalComps AS3 component library, for the components used in one of the examples
  • Grant Skinner’s gTween class, which is required by the NoncontiguousCartogram class for tween transitions

Browse all the above or download the zip.

<snip>

more advantages

In my thesis research last spring, noncontiguous cartograms performed quite well: subjects rated them highly on aesthetics and could locate and estimate the areas of features with relatively high accuracy. I would add the following to Olson’s list of noncontiguous cartogram advantages.

  1. Olson concentrates on the perfect shape preservation of noncontiguous cartograms. The form (well, those with units centered on the original enumeration unit centroids, as in Olson’s projector method) also perfectly preserves the location of the features on the resultant transformed cartogram. Not only are features easier to recognize, but locations within the transformed units can be accurately located as well (cities or mountain ranges from the original geography can be accurately plotted on the transformed cartogram).
  2. Because units are separate on the transformed cartogram, their figure-ground is increased and areas of features can therefore be more accurately estimated.
  3. Many cartogram designs (including most manual cartograms and the Gastner-Newman-produced cartograms) sacrifice some accuracy for shape recognition. This is a defensible tradeoff, especially as area estimation is notoriously inaccurate and nonlinear. Yet it’s a tradeoff that noncontigous cartograms need not make, as they can always perfectly represent the data with relative areas without sacrificing shape preservation.

Thus, noncontiguous cartograms seem to excel at the cartogram’s two main map-reading tasks: shape recognition and area estimation. This is mediated somewhat by the chief advantage of contiguous cartograms: compactness. Because no space is created between enumeration units, contiguous cartogram enumeration units can be larger than those on noncontiguous cartograms, all other things equal. The increased size on contiguous cartograms may improves their legibility.

Read the full entry over at Indie Maps . . .