Code Attacks!

Happy Autumn!

I am in full swing writing and researching new episodes for the first season of Device! It’s a very exciting ride!

However, I find myself unable to break some of my ‘science-based’ habits. Mostly meaning, finding a direct way to visualize and explain data. Of course, with podcasts, you have to visualize with your words rather than graphics. Yes yes, we all know that an image is worth a thousand words, but images can also do more than that. They can highlight which of the thousand words are the most important. Or at the very least, the most interesting.

One episode for the upcoming season is going to focus on Jaws by Peter Benchley. Let me tell you, it is shaping up to be a good one. I went down a bit of a rabbit hole, though, looking into all documented shark attacks that have occurred in San Diego. Spoiler alert: it isn’t that many. Even though more shark attacks have occured along San Diego County than other coastal counties in California, we have only had 35 attacks, of which two were fatal, since 1930. The fear frenzy caused by Jaws, both the book and more so the movie, has given the public an overemphasized unease regarding sharks which has severely impaired conservation efforts.

More to the point of this post, how does one go about looking into shark attacks? The Global Shark Attack File (GSAF) is how. Sorted and organized by www.sharkattackdata.com, anyone can research individual shark attacks that have happened all over the world. Doing just that, I made a pretty chart:

All documented shark attacks off San Diego county since 1933 (first on record) by species and fatality. Larger circles indicate 4 nonfatal attacks in the same year, at the same location, by the same species; hammerhead sharks in La Jolla in 1959 and…

All documented shark attacks off San Diego county since 1933 (first on record) by species and fatality. Larger circles indicate 4 nonfatal attacks in the same year, at the same location, by the same species; hammerhead sharks in La Jolla in 1959 and unknown shark species near San Onofre in 2009. Both fatal shark attacks resulted from white shark attacks (GSAF).

We have a lot of species diversity in our waters. Great White Sharks, or just white sharks, are only one of ten species we share our coastline with. Seven species have been involved in shark attacks.

When people hear “shark attack!”, they mostly assume the worst: that a 20-feet long MANEATING GREAT WHITE SHARK is coming with a thirst for human blood. THE WATER IS UNSAFE, THERE ARE SHARKS.

It’s OK to be scared. We’re all scared of something.

However, what this chart shows us is that shark attacks are pretty uncommon in San Diego waters, and if you are unfortunate enough to be bitten chances of survival are very high.

I more or less stated this before I showed you the chart with my ‘spoiler alert.’ So here’s a question: If the thought of shark attacks make you nervous, did reading my ‘spoiler’ or seeing the chart make you feel better?

I’m willing to wager that that chart is more effective than words alone. The spoiler informed you, the reader, that 33 people survived shark attacks and two people died. Equal weight in words is given to the 33 survivors and the 2 fatalities. And we’re talking about human lives here, it’s in our nature to focus and mourn the ones we’ve lost.

The chart, however, gives equal weight to all 35 individuals. If a picture is worth a thousand words, this picture gives voice not only to every victim but also to sharks. It’s true that sharks are mostly afraid of us, we aren’t very good food, and therefore, for the most part, they leave us alone.

So it’s OK to be scared, just as long as you don’t shift your fear to supporting hurtful actions against animals that don’t know any better. Animals that really need our help.

We’ll get into all that and more when this episode premieres! On KPBS spring 2019. :D

Lastly, I wanted to share how I made the chart above. I used the programming language R in the RStudio environment with the package ggplot2. It’s less than 20 lines of code, though I did create a custom theme which is just under 50. Seeing as the data is publically available, I’m going to make my code available as well. I’m also going to explain it.

This is for all you biologists out there struggling to learn programming. I feel ya, oh how I feel ya. Hopefully this will be part of a series of posts where I explain some of my code.

First, the data. Just a simple CSV file:

If you just want to take a look at the code and get going, have at it:

We’re going to start with two necessary lines of code and one optional. To make the plot, you will need the data I’ve listed above and the R package ggplot2.

#Read in data. Data is tab separated ("\t").
attacks=read.csv("SanDiegoSharkAttacks.csv", stringsAsFactors = T, sep="\t")
#load necessary coding libraries
library(ggplot2)
#load plot theme (unnecessary).
source('theme_sleek.R')

Typically when I read in data I set stringAsFactors to False. I find that R defaults to setting string variables as factors one of its more annoying features. When you are reading in new datasets and are first playing around with them, I would set your string variables as a character vector (stringAsFactors=F). Factor vectors can be limited in their utility if you don’t have a direct path for analysis, and you can always set vectors as factors later on. In this case, I know exactly what I want to do with my categories, and I need them to be factors, so I am skipping this step.

If you haven’t already installed ggplot2 you’ll need to. Seriously, it makes such dramatically better plots than R standard. While you’re at it, install gridExtra as well and play around with it. Really power data representation tools.

Next up is my custom theme. I am not going to take the time and explain the ins and outs of this bit of code. It’s mostly personal preference. However! You can download it above and poke around.

Next up is setting levels for my factors, this way the plot will display the different categories (e.g. Area and Species) in the order that I choose.

#Order beach areas from South to North.
attacks$Area <- factor(attacks$Area, levels = c("Imperial Beach", "Coronado",
    "Sunset Cliffs","Mission Beach","Pacific Beach","La Jolla",
    "Solana Beach","Carlsbad","San Onofre","Offshore San Diego"))

#Order speices so all data can be seen on the plot.
attacks$Species <- factor(attacks$Species,levels = c("Unkn",
     "Hammerhead", "Horn","Tiger","SevenGill","Blue",
     "Mako","White"))

We are ordering these vectors for two purposes. The Area vector is the basis for my y-axis, and when plotting the ggplot() function will default to an alphabetical display. The plot is easier to read if the towns read north to south as they do along the coast. Therefore, I have assigned the location that is the furthest south (Imperial Beach) to the lowest level (1) and assigned the location furthest north (San Onofre) to the highest (10) by listing them in the order I want.

I ordered the shark species for a more cosmetic reason. Again, ggplot() automatically displays everything alphabetically/numerically. In this default order some of the attacks were hidden behind other attacks. This took some playing around, but I ordered the species here so that all of the attacks are visible in the plot.

#Subset only the fatal shark attacks.
fatal=subset(attacks,attacks$Fatality=="Fatal")

This subset is so I can emphasize the fatal shark attacks later on.

#Assign the data to a ggplot object, p, and provide aesthetic features.
p=ggplot(data=attacks,aes(x=Year,y=Area, colour = Species, shape=Fatality)) 
    + geom_count()+scale_size_continuous(range = c(3, 8),guide = FALSE) 

First, we have to tell ggplot() what to plot! Using the “attacks” dataset, I’ve plotted the year of the attack along the x-axis and the location of the attack along the y-axis, and told ggplot() to give each species a different color and the fatalities a different shape. It isn’t important what colors and shapes right now, because we’re going to overwrite them one line down. What is important is that these features are assigned using the aesthetic function, or aes(). This function allows us to create legends of the data later on. Right now I have legends, or ‘guides’, set to false because we are going to overwrite them.

geom_count() tells ggplot to count how many data duplicates there are (e.g., 4 nonfatal hammerhead shark attacks in 1959 off La Jolla). Data points are larger if there are more duplicates. We set the size range for these data points using scale_size_continuous(). Here, I’ve limited the point sizes between 3 and 8. The point sizes are appropriate for the total size of the chart. If I wanted to create a large, high-resolution copy of this chart, I would need to increase these values.

#Provide a better color scheme and assign shapes.
p=p+scale_color_manual(values=c("gray30","Green3","deeppink3","orange","chocolate4","blue","purple","white"))
    +scale_shape_manual(values = c("Fatal" = 17,'Nonfatal' = 16))
#Add an emphasis on the fatal shark attacks in red.
p=p+geom_point(data=fatal, color="red3",shape=17,size=2)

Next, I am going to apply a distinct color to each of the species. These colors should be listed in a corresponding order to the species. I keep a list of accepted R Colors bookmarked on my web-browser to double check when fine-tuning the colors of my plots. Fatalities are marked with a triangle (17) while nonfatal attacks are circles (16). There are many different point shapes to choose from in R. Like I did with the shapes, you can take extra care to tell ggplot that you want the species “White” = “white” in color and so on. If I hadn’t previously ordered my species, I would take the time to do this. But since they are ordered already, it’s sufficient to just double-checking my levels.

Almost done, I’ve added a smaller, red triangle inside the points marking fatal shark attacks. Adding this emphasis makes them easier to identify since there are so few of them. This is why I isolated the fatalities earlier; it is easier to have R treat these points as two separate datasets then try and convince R to plot the same points twice.

#Add custom theme and Title, Save.
p=p+theme_sleek()+ggtitle("Shark Attacks off San Diego County")
ggsave('Shark Attacks San Diego.jpg', plot=p,width = 8, height = 5, units = "in")

Lastly, I’ve applied my custom theme and given the plot a title. If you don’t want to try this theme, ggplot2 comes with many different themes. My favorite for scientific publications is theme_minimal().

And that’s it! Tune in soon for the Jaws episode!