Posts Tagged ‘php’

Building the Data Desk: Lessons From the L.A. Times (Knight Digital Media Center)

Thursday, December 4th, 2008

[Editor’s note: Great article on how data, including GIS, maps, and Google mashups can be leveraged in news media environments from a veteran of the LA Times. Thanks Aly! (and bon voyage)]

Republished from Knight Digital Media Center (OJR). By Eric Ulken on Nov. 21, 2008.

In early 2007, when the Los Angeles Times launched its Homicide Report blog — an effort to chronicle every homicide in Los Angeles County — it was clear that there were important geographic and demographic dimensions to the information that a blog format wouldn’t fully capture. What we needed was a ChicagoCrime.org-style map that would let users focus on areas of interest to them, with filters that would enable them to “play” with the data and explore trends and patterns for themselves. Problem was, the web staff (of which I was a part) lacked the tools and the expertise to build such a thing, so the blog launched without a map. (Sound familar?)

It took several months to secure the tech resources and a couple more months to create wireframes and spec out requirements for what would become the Homicide Map, with the help of a couple of talented developers and a project manager on part-time loan from the website’s IT department. We were fortunate, of course: We actually had access to this kind of expertise, and since then we’ve hired a couple of dedicated editorial developers. I’m aware that others might not have it so good.

Last week, Robert Niles argued that news organizations should be in the business of creating “killer apps”. Put another way, there is a need to develop tools that hew to the content rather than the other way around. But creating the functionality Robert describes takes a closer connection between news thinking and tech thinking than is possible within news organizations’ traditional structures and skill sets.

In this post, I’ll try to squeeze some wisdom out of the lessons we learned in the process of assembling the Times’ Data Desk, a cross-functional team of journalists responsible for collecting, analyzing and presenting data online and in print. (Note: I left the Times earlier this month to work on some independent projects. I am writing this piece with the blessing of my former bosses there.)

Here, then, are 10 pieces of advice for those of you building or looking to build a data team in your newsroom:

  1. Find the believers: You’ll likely discover enthusiasts and experts in places you didn’t expect. In our case, teaming up with the Times’ computer-assisted reporting staff, led by Doug Smith, was a no-brainer. Doug was publishing data to the web before the website had anybody devoted to interactive projects. But besides Doug’s group, we found eager partners on the paper’s graphics staff, where, for example, GIS expert Tom Lauder had already been playing with Flash and web-based mapping tools for a while. A number of reporters were collecting data for their stories and wondering what else could be done with it. We also found people on the tech side with a good news sense who intuitively understood what we were trying to do.
  2. Get buy-in from above: For small projects, you might be able to collaborate informally with your fellow believers, but for big initiatives, you need the commitment of top editors who control the newsroom departments whose resources you’ll draw on. At the Times, a series of meetings among senior editors to chart a strategic vision for the paper gave us an opportunity to float the data desk idea. This led to plans to devote some reporting resources to gathering data and to move members of the data team into a shared space near the editorial library (see #8).
  3. Set some priorities: Your group may come from a variety of departments, but if their priorities are in alignment, disparate reporting structures might not be such a big issue. We engaged in “priority alignment” by inviting stakeholders from all the relevant departments (and their bosses) to a series of meetings with the goal of drafting a data strategy memo and setting some project priorities. (We arrived at these projects democratically by taping a big list on the wall and letting people vote by checkmark; ideas with the most checks made the cut.) Priorities will change, of course, but having some concrete goals to guide you will help.
  4. Go off the reservation: No matter how good your IT department is, their priorities are unlikely to be in sync with yours. They’re thinking big-picture product roadmaps with lots of moving pieces. Good luck fitting your database of dog names (oh yes, we did one of those) into their pipeline. Early on, database producer Ben Welsh set up a Django box at projects.latimes.com, where many of the Times’ interactive projects live. There are other great solutions besides Django, including Ruby on Rails (the framework that powers the Times’ articles and topics pages and many of the great data projects produced by The New York Times) and PHP (an inline scripting language so simple even I managed to learn it). Some people (including the L.A. Times, occasionally) are using Caspio to create and host data apps, sans programming. I am not a fan, for reasons Derek Willis sums up much better than I could, but if you have no other options, it’s better than sitting on your hands.
  5. Templatize: Don’t build it unless you can reuse it. The goal of all this is to be able to roll out projects rapidly (see #6), so you need templates, code snippets, Flash components, widgets, etc., that you can get at, customize and turn around quickly. Interactive graphics producer Sean Connelley was able to use the same county-level California map umpteen times as the basis for various election visualizations in Flash.
  6. Do breaking news: Your priority list may be full of long-term projects like school profiles and test scores, but often it’s the quick-turnaround stuff that has the biggest immediate effect. This is where a close relationship with your newsgathering staff is crucial. At the Times, assistant metro editor Megan Garvey has been overseeing the metro staff’s contributions to data projects for a few months now. When a Metrolink commuter train collided with a freight train on Sept. 12, Megan began mobilizing reporters to collect key information on the victims while Ben adapted an earlier Django project (templatizing in action!) to create a database of fatalities, complete with reader comments. Metro staffers updated the database via Django’s easy-to-use admin interface. (We’ve also used Google Spreadsheets for drama-free collaborative data entry.) … Update 11/29/2008: I was remiss in not pointing out Ben’s earlier post on this topic.
  7. Develop new skills: Disclaimer: I know neither Django nor Flash, so I’m kind of a hypocrite here. I’m a lucky hypocrite, though, because I got to work with guys who dream in ActionScript and Python. If you don’t have access to a Sean or a Ben — and I realize few newsrooms have the budget to hire tech gurus right now — then train and nurture your enthusiasts. IRE runs occasional Django boot camps, and there are a number of good online tutorials, including Jeff Croft’s explanation of Django for non-programmers. Here’s a nice primer on data visualization with Flash.
  8. Cohabitate (but marriage is optional): This may be less of an issue in smaller newsrooms, but in large organizations, collaboration can suffer when teams are split among several floors (or cities). The constituent parts of the Times’ Data Desk — print and web graphics, the computer-assisted reporting team and the interactive projects team — have only been in the same place for a couple months, but the benefits to innovation and efficiency are already clear. For one thing, being in brainstorming distance of all the people you might want to bounce ideas off of is ideal, especially in breaking news situations. Also, once we had everybody in the same place, our onetime goal of unifying the reporting structure became less important. The interactive folks still report to latimes.com managing editor Daniel Gaines, and the computer-assisted reporting people continue to report to metro editor David Lauter. The graphics folks still report to their respective bosses. Yes, there are the occasional communication breakdowns and mixed messages. But there is broad agreement on the major priorities and regular conversation on needs and goals.
  9. Integrate: Don’t let your projects dangle out there with a big ugly search box as their only point of entry. Weave them into the fabric of your site. We were inspired by the efforts of a number of newspapers — in particular the Indianapolis Star and its Gannett siblings — to make data projects a central goal of their newsgathering operations. But we wanted to do more than publish data for data’s sake. We wanted it to have context and depth, and we didn’t want to relegate data projects to a “Data Central“-type page, something Matt Waite (of Politifact fame) memorably dubbed the “data ghetto.” (I would link to Waite’s thoughtful post, but his site unfortunately reports that it “took a dirt nap recently.”) I should note that the Times recently did fashion a data projects index of its own, but only as a secondary way in. The most important routes into data projects are still through related Times content and search engines.
  10. Give back: Understand that database and visualization projects demand substantial resources at a time when they’re in very short supply. Not everyone in your newsroom will see the benefit. Make clear the value your work brings to the organization by looking for ways to pipe the best parts (interesting slices of data, say, or novel visualizations) into your print or broadcast product. For example, some of the election visualizations the data team produced were adapted for print use, and another was used on the air by a partner TV station.

When I shared this post with Meredith Artley, latimes.com’s executive editor and my former boss, she pointed to the formation about a year ago of the interactive projects team within the web staff (Ben, Sean and me; Meredith dubbed us the “cool kids,” a name that stuck):

“For me, the big step was creating the cool kids team — actually forming a unit with a mandate to experiment and collaborate with everyone in the building with the sole intention of creating innovative, interactive projects.”

And maybe that should have been my first piece of advice: Before you can build a data team, you need one or more techie-journalists dedicated full-time to executing online the great ideas they’ll dream up.

What else did I miss? If you’ve been through this process (or are going through it, or are about to), I hope you’ll take a minute to share your insights.

GBIF data heat maps – Heat maps over Google Maps for Flash (Biodivertido)

Wednesday, September 3rd, 2008

[Editor’s note: Fascinating proof-of-concept for how to create and display heat-maps in Google Maps for Flash/Flex AS3 using PHP back-end for calculation and Flash for front-end. More information for using Google Maps in Flash CS3 download and reference and tutorial. Similar to some nifty work Zach Johnson is working up at Universal Mind for spatialkey.com.]

Reprinted from Biodivertido.

Maps like everything else seems to be trendy. And nowadays the sexy thing in mapping is the creation of Heat Maps. The best way to understand what they are is to see them:

You can also take a look at this post from one of my favorite blogs on what is and what is not a heat map.
Well for long time I wanted to give it a try and yesterday I had the time to experiment a bit. The idea was to display GBIF available data as a Heat Map over Google Maps. Here you have an screenshot for Quercus ilex:
And if you want to try for yourself here it is (some usability issue, the search box is on the bottom right corner):
So how does it work? It was actually easier than I expected:
1) Get the data: I am using the so called “Density tables” from GBIF. You can access them through GBIF web services API at http://es.mirror.gbif.org/ws/rest/density . For example in a query like this one for Quercus ilex (of course you need to get the taxonconceptkey from a previous request to the services): 
This works fine but has some problems. The first one is that GBIF goes down almost every evening. Tim can maybe explain why. Thats why I am using the spanish mirror (look at the url) and I recommend you to do the same.
Second problem is the verbosity of the XML schema being used. For downloading the Animalia, well thats the biggest concept you can get probably, the result is 14.1 MB of XML. And thats just to get a list of cellIds (if anybody is interested we can post details about CellIds) with counts on them, exactly an array of 34,871 numbers. Even worst is handling them on a web client like this one, parsing such a huge xml output kills the browser. The GBIF webservices API deserve its own blog post I would say together with Tim.
But what is new is that I have supercow powers on GBIF :D I am working for GBIF right now and have access to a test database. In a testing environment I developed a little server app that publish the same density service but using theAMF protocol. I used AMFPHP for this if anybody is interested. There are two good things about using AMF: The output now is around 150 KB for the same thing and AMF is natively supported by Flash so there is no need to be parsed it goes straight into memory as AS3 Objects.
2) Create a Het Map from the data: Once the data is on the client I make use of a Class from Jordi Boggiano called HeatMap.as that creates Sprites as the result. In my case I decided to create a Spring, think like an Image, of 1 pixel per cellId creating a 360×180 pixel image (cellId is equivalent to a 1 degree box).
3) Overlay the image on Google Maps: When you have the Sprite, or even earlier but thats too many details, what you do is overlaying in Google Maps for Flash using a GroundOverlay object that takes care of the reprojection and adapting it to the map. The GroundOverlay is explained in the doc as a way to overlay images but it accepts actually any Sprite.
Done! (almost)
4) Ok, there are some problems: Yes, it is not perfect, these are the pending issues:
  • The GroundOverlay seems to not be reprojecting correctly the Sprite I generate and in the very north and south everything is not correctly overlayed.
  • The resolution of the Heat Map is a little bit poor, bu actually represent the quality of the data we have. Some interpolation could be done to make it look nicer.
  • The colours of the Heat Map do not fit well with the actual Google Maps layers. When there is small data then you can not see it almost.
I still dont feel confident with the code to release it yet. I hope I can work a little bit more on it so that i can be proud, but if you desperately need it let me know.
Just another notice. Yesterday Universal Mind released a preview of a new product: Spatial Key. I am always impressed with what this people does and follow the blogs from their developers (like this one and this one). They are kind of my RIA and web GIS heroes. The new product they have released actually look very much like what I wanted to do in Biodiversity Atlas for data anlysis. It lets people explore geographically and temporally huge datasets. Tim suggested me to contact them and I will do. Nevertheless it is great to have such a great tool available to get ideas on interaction design. Good job Universal Mind, you really rock.
We want to see your comments!
Update: 
Some people asked for different quality settings on the heat map. I have modified the application so that you get now a set of controls to define different quality and drawing options. By default the app tries to figure out depending on the number of occurrences, but maybe thats not the best, depends on how the data is dsitributed. In a final product I think I would NOT provide this functionality to the user, too much for my taste. You know, less is more.

Newsflash! ESRI to best Google Maps with Mashup Capability

Monday, May 12th, 2008

(Reprinted from flex888.com. View original post.)

Finally, GeoWeb is Complete and Born

Posted by Moxie | March 19, 2008 .

What’s is the best RIA application ever created? If your answer is something aroundFlex or Flash, then it’ll be wrong answer. The right answer is Google Map. It’s Google Map makes AJAX known and RIA a reality. Google even goes above and beyond claimed the term “GeoWeb“. However, up till now, Google Map is still just the best client, the visualization end, of GeoWeb. The “Geo” part of GeoWeb was missing.

Yesterday, ESRI, the shy, but true and real “Geo” dude behind all, I mean ALL, the web map buzz and technologies, released its very own JavaScript API and REST based Geo Process services to the world. The GeoWeb is finally complete and born.

The JavaScript API has three parts, the ESRI JavaScript API, the Google Map extension, the Virtual Earth extension. That means you can use the top three GeoWeb clients with this simple API to do the real “Geo” things.

What is the “Geo” things and why it’s a big deal to GeoWeb?

Well, everyone and his/her grandma knows what Google Map does, plans the trip and shows locations. What’s the most mashed up platform? Google Map. What 99% Google Map mashup applications do? Put pins (markers) on the map? But what if we want to ask some questions beyond the pushpins:

  • Within 5 minutes driving time, show me the areas that I can reach. Don’t fool me with a circle. That is cheating. Because there might be highway, service street, or river among the 5 minutes driving range. The area you can cover by driving is a irregular polygon. But how do you get that polygon drawn on the map to show the 5 minute driving range?
  • Three of my friends want to meet for lunch. We want to meet at a Starbucks where everybody has the least driving time to get there. Fair enough? But how do you quickly give me that Starbucks location and provide driving direction for each of us.

The questions can go on and on. How these questions are answered? Through a thing called Geoprocessing, which is provided by the technology called GIS (geographic information system). But why you’ve never heard of it and it’s not well known in the Web 2.0 space? That’s because it’s a very hard nut to crack and only a few dudes know how to do it inside out. ESRI is the one does it the best, and now, it gets everything figured it out. The whole web can have it.

If I tell you, with three lines of JavaScript codes, plus some regular JavaScript programming you can easily answer the above question visually on either ESRI map, Google Map or Earth Map. Do you believe me?

You don’t have to because I’ll show you how.

First Line:

    var map = new esri.Map(“mapDiv”, { extent: startExtent });

Looks familiar, isn’t it. Indeed, it’s just like Google Map or Virtual Earth API.

Second Line:

    var streetMap = new esri.layers.ArcGISTiledMapServiceLayer
(“http://server.arcgisonline.com/ArcGIS/rest/services/
ESRI_StreetMap_World_2D/MapServer”);

Something new here. Well, if you head to ArcGIS Online, a free gwoweb resource from ESRI, you would find out there are lots of good free base maps you can choose. Or, you can use any map published to a ArcGIS Server. It’s long story here for those map publishing goodies, I’ll tell you later, piece by piece. But just you know this line of code gives you a whole big world of maps to works with. Just remembering that is enough for now.

Third Line:

var gp = new esri.tasks.Geoprocessor
(“http://sampleserver1.arcgisonline.com/ArcGIS/rest/services/
Network/ESRI_DriveTime_US/ GPServer/CreateDriveTimePolygons”);

This is “Geo” part of the GeoWeb. One line, it consumes a geoprocess, in this case, a services called CreateDriveTimePolygons. This geoprocess called is actually via REST API (as the URL reveals) . The returned result can be in JSON, KML or XML. That means you really don’t have to use this JavaScript API. As matter of fact, I do have Perl or PHPexamples consume the very same gepprocess, but that’ll be another post.

The rest code is really just parse the result and draw the polygon on the map. If you know Google Map API, there are no surprises there.

The following is the true GeoWeb application I’ve introduced to you. You can zoom in to any city just like you would do with gmap (scrolling mouse, drag the map, etc.). Then click the map. The 1, 3 and 5 minutes driving time polygon will be shown.

Click Here to Run the Application (view source for detail code).

I will post another example to solve that other problem using Flex. Stay tuned.

How Tag Clouds Work (indiemaps.com)

Wednesday, May 7th, 2008

Zach Johnson has a good post about how Tag Clouds work from a cartographic perspective on his indiemaps.com blog. While we have been trained to scale objects based on their area, he concludes tag clouds might be best scaled by size (height) alone.

I have done some Illustrator scipting that take into account the ink area and the raw results are unsatisfactory and must be scaled again by the width of the tag character count to still make visual sense. All this work does not significantly change how the tag cloud is read (indeed, it may make it harder) and must be done in a graphics environment like Illustrator or Flash (not simple HTML).

Read Zach’s full post here…

zack johnson tag cloud