Saturday, November 24, 2012

Comparative Histograms in ggplot2

I have been doing a lot of work with limited dependent variables lately.  With my push for better visualization as a means to communicate findings (as opposed to relying on tables of coefficients), I have had to develop some strategies for building effective comparative histograms.  I did some of this in my recent JPART and continue to use the same code for other projects (mostly, my risk perception and trust work).  I thought some might like to see how simple it is to create nice comparative histograms for categorical dependent variables:




The process actually starts in STATA.  I have been relying on generalized ordered logit which has no canned routine in R -- so I have been doing most of my analysis in STATA.  Of course, one can use the same process entirely in R if you wished.   I use STATA and the "prvalue" command to get estimated probabilities for each level of the dependent variable.  I then use that data as inputs for the following code.  I hope you find it as useful as I have.

I will provide an example using data for three levels of an independent variables (Odds) and a dependent variable with three levels (Oppose/No Opinion/Support) of assessment of fracking.  I enter the estimated distribution of probability for each level (e.g. 44/100 will oppose in the base condition).  Once I create the based graphs, I make sure it is on a base scale (0 to 100) for comparisons.  After creating all three separate histograms, I combine them into one figure to ease comparison.

CODE FOLLOWS:


library("ggplot2")
library("gridExtra")


# The base (low odds) condition is simulated to have 44 oppose, 32 No opinion, and 24 Support.
Odds_0 <- c="c" div="div" o="o" op.="op." ppose="ppose" rep="rep" upport="upport">
Odds_0.plot <- dds_0="dds_0" div="div" geom="bar" nbsp="nbsp" qplot="qplot">
                    ylab=("Count"), 
                    xlab=("Low Odds")
                    geom_bar=(width=.5)
                     ) 
Odds_0.plot2 <- 100="100" div="div" limits="c(0," odds_0.plot="odds_0.plot" scale_y_continuous="scale_y_continuous">
Odds_0.plot3 <- geom_bar="geom_bar" odds_0.plot2="odds_0.plot2" width=".5)</div">

# The medium odds condition is simulated to have 57 oppose, 27 No opinion, and 16 Support.
Odds_1 <- c="c" div="div" o="o" op.="op." ppose="ppose" rep="rep" upport="upport">
Odds_1.plot <- dds_1="dds_1" div="div" geom="bar" nbsp="nbsp" qplot="qplot">
                     ylab=("Count"), 
                     xlab=("Medium Odds")
                    ) 
Odds_1.plot2 <- 100="100" div="div" limits="c(0," odds_1.plot="odds_1.plot" scale_y_continuous="scale_y_continuous">
Odds_1.plot2
Odds_1.plot3 <- geom_bar="geom_bar" odds_1.plot2="odds_1.plot2" width=".5)</div">

# The high odds condition is simulated to have 66 oppose, 23 No opinion, and 11 Support.
Odds_2 <- c="c" div="div" o="o" op.="op." ppose="ppose" rep="rep" upport="upport">
Odds_2.plot <- dds_2="dds_2" div="div" geom="bar" nbsp="nbsp" qplot="qplot">
                     ylab=("Count"), 
                     xlab=("High Odds")
Odds_2.plot2 <- 100="100" div="div" limits="c(0," odds_2.plot="odds_2.plot" scale_y_continuous="scale_y_continuous">
Odds_2.plot3 <- geom_bar="geom_bar" odds_2.plot2="odds_2.plot2" width=".5)</div">

#  Combining the histograms                    
OddseffectH <- dds_0.plot2="dds_0.plot2" grid.arrange="grid.arrange" ncol="3)</div" odds_1.plot2="odds_1.plot2" odds_2.plot2="odds_2.plot2">

No comments:

Post a Comment