Perfecting Data Visualization with Plotly Using Open-Source Data
Humans rely on audio and visual stimuli to navigate the surrounding world. An entire industry exists to capitalize on these senses and successfully convinces millions of people to purchase their products and services on a daily basis. For most of the professional world, sight is the leading intuition that drives value. For engineers, visual prowess can be demonstrated through graphs and figures, computer aided design, and artful manufacturing. One of the easiest ways to impress employers, colleagues, or clients is to maintain high visual stimuli when presenting work. Below is a simple example of a visually pleasing and thought-provoking plot. A quality figure should have complimentary colors, visible fonts, descriptive labels and titles, and needs to tell a story. Now, this is no masterpiece of a figure, but it's close to the bare-minimum expected when presenting on a professional level. Arguments could be made as to whether the font is large enough, or the colors are right, or perhaps the data isn't significant enough; but it serves as a basis for expectation.
The data presented above is a publicly available, open-source governmental database called OpenDataNYC. Another great resource is the National Oceanic and Atmospheric Association (NOAA) website, where there are gigabytes and even terabytes of available data waiting to be processed and plotted. For more general science-related datasets Nature published a great article that references several useful data repositories [see here]. Because there are countless databases scattered throughout the web, it is wise familiarize oneself with parsing and processing data of various formats.
In atmospheric science, researchers use a technique called remote sensing to record and analyze the climatological and meteorological trends and cycles of the earth. Remote sensing employs autonomous data collection over large periods of time, too long to be supervised by humans, and permits intermittent retrieval and analysis. Satellites, anemometers, LiDAR devices [see here], microwave radiometers [here], and passive infrared gas analyzers [here] are just a few types of sensors that atmospheric scientists use when studying the atmosphere.
Numerous resources are available to scientists, especially in the U.S., that encourage the study of weather and climate-related events. I will be using historic wind data from an instrument located on the eastern tip of San Francisco, CA, which is openly accessible to the public [see here]. The instrument recorded wind velocity, wind direction, ambient temperature, and horizontal solar radiation, although I will only be using the first three.
For a dataset as extensive as the one used here, the possibilities are endless. There are 35 weather stations with roughly 6-7 years worth of data sampled at either 5 or 15 minute intervals. This leaves over 1 billion data points to be analyzed. There is ample opportunity for correlation between stations and even comparisons between weekly, monthly, or yearly trends. I only cover one station in 2014 and its seasonal and year-long variations, however, the data is available if a more in-depth investigation is desired.
Plotted above is the diurnal, hourly averaged air temperature, wind direction, and wind velocity for the Pier 40 San Francisco site located at the northeastern tip of the city. One can observe a diurnal profile typical for those three variables. It is important to note, that it is a year-long plot, so seasonal behavior may be muted, which is why there are three plots below demonstrating the seasonal (monthly) variability of the three variables.
Overall, my goal here was to inform other scientists, engineers, and data miners that there are wells of information available online and open to the public. Each of the three datasets used above were real-world examples with potential research-grade statistical and physical significance. With the appropriate cultivation and experience, there are ample opportunities to publish meaningful results using data available to the public. I hope this coverage of open data and the resource poltly were beneficial and encourage involvement in the open source community.
See more in Data Analysis: