Tuesday, October 25, 2011

ART and the Curse of Dimensionality

I will begin my discussion of ART ("a rule of three" - as discussed in a post last week) with a look at the statistical problem that motivates ART.

In this recent presentation at Texas A&M, Achen focused on two motivations for ART:  [1] the frailty of high dimensional models, and [2] strong linearity assumptions.  I may return to the linearity problems later (I am a little less worried about this than Chris Achen seems to be), but I want to focus on dimensionality for a moment.

The "curse of dimensionality" has been the subject of some discussion within social science models -- but the development of more complicated statistical models has continued unabated.  The core idea is that finding the maximum of a high dimensional space is hard -- sometimes very hard.  In the context of social science models, parameter estimates of high dimensional models may be subject to a great deal of influence or leverage.  For example, you may have a large sample overall but have very few Hispanic respondents.  The typical strategy of including a dummy variable for Hispanic ethnicity assumes that Hispanic respondent vary only in the intercept (and respond similarly to non-Hispanic respondents in regards to all other variables).  The alternative approach (in something akin to a hierarchical model - with ethnicity as a level) is to allow both the intercept and other slopes vary by ethnicity -- but this places remarkable demands on the sample of data.  Achen's concern is that statistical models will almost always give us some answer.  Relying on variation within a small sub-sample (say, Hispanic respondents in a particular management survey), we will get a coefficient but that coefficient may be unreliable.

Some of this problem is identifiable through careful assessment of leverage diagnostics.  If you look at Cook's D values, etc. you can diagnose situations where small sub-samples sizes within your study create fragile coefficients prone to leverage from a small number of observations.  Even this becomes difficult with non-continuous independent and dependent, variables though. (but see this paper for an interesting strategy).

Achen argues that we can avoid this situation entirely by more carefully selecting a sample.  If we want to look at the behavior of Hispanic administrators, we can select a sample of Hispanic administrators.  If we think ethnicity matters but that we don't have enough Hispanic respondents to ensure stable/reliable parameter estimates, we are better off constructing a sample without variation in ethnicity (by omitting Hispanic respondents) and then saying that the sample is homogeneous in terms of ethnicity.

I will return to the subject of sampling as a solution, but I will leave you with two thoughts:  [1] Is this approach useful outside of the NES world of tens of thousands of observations from which one can create homogenous sub-samples, [2] will this ghettoize the study of race and ethnic minorities as the "normal" research proceeds to test hypotheses with racially and ethnically homogeneous samples (read "white, male, moderately educated non-southerners"). 

In the short term, are you concerned that in public management large models (those containing many independent variables) are fragile?  Are you intrigued by the strategy of selecting a sample uniform with respect to variables for which you want to control rather than introducing control variables?

Tuesday, October 18, 2011

Why I have switched to LaTeX

Last year I switched to using LaTeX as my primary means for writing academic papers - and I have not looked back.  Before I provide more information on how to make a similar switch, I want to provide the reasons I have switched and why I won't be switching back :).
 
1]  Professional quality manuscripts -- I got a great piece of advice from the late Larry Terry.  When looking at a draft I had prepared he implored me to write the draft as if it were already accepted by the journal.  He thought reviewers were more likely to accept manuscripts that looked like manuscripts that were already accepted.  While he did not intend this to mean "use LaTeX", he brought my attention to the importance of reviewer reactions to subtle aspects of drafts.  He recommended that I refer to "this article" within the manuscript instead of "this manuscript", for example.  He was convinced that reviewers treated manuscripts that read like drafts or working papers as just that.  While reviewers would be more positively inclined in reviewing manuscripts that look or read like final manuscripts. 

When I see a paper prepared in Word, I start from the assumption that it is a draft.  No professional publication looks like a Word document.  LaTeX produces documents that look like professionally published manuscripts.  It is certainly not the case that reviewers in my fields are looking for LaTeX as some sort of gatekeeper (reviewers in my field are largely ignorant of LaTeX altogether) but I do get better reactions with manuscripts that look professionally typeset.

2]  Structural composition -- LaTeX forces a writer to consider the outline of his or her manuscript and to write to the plan.  I find that this approach to writing forces me to adopt better writing practices.  The argument in my manuscript becomes transparent - or, at least, clearer.  I see the parts of the paper more easily and can make for a more coherent manuscript. I have long been a compulsive outliner.  LaTeX is an approach to document preparation that is consistent with outline-based writing.

Word makes using outlines a little awkward.  I ran into tedious problems with the outlining functions and occasional formatting problems associated with detailed outlining in the draft.  It looks like newer version of Word are better about this - but it was too little, too late for me.

I still sometimes use Word for shorter documents where such structure is not essential.  For articles, though, I feel a little lost without the structure that LaTeX calls for.

3] Extensions -- I have found many, many packages to add to LaTeX that allow me to integrate various types of figures, tables, etc. that make writing manuscripts easier.  It goes without saying that LaTeX shines when dealing with mathematical symbols and equations.  However, I have also found convenient packages to create figures, GANTT charts, node diagrams, slides, and effective reference management.  I may write more about these in the coming weeks - but this has been a pleasant surprise for me in working with LaTeX.  To replicate LaTeX's functionality, I would need not only Word but also RefWorks, Visio, Project, and PowerPoint (and possibly still need some Adobe products).  Instead, I have consistent syntax across all of these functions within LaTeX. 

4]  Free -- LaTeX can be used without a commitment to an expensive software package.  As a faculty member, I do have access to inexpensive licenses for Word.  However, I have run out of the legally allowed licenses given the number of computers in my household.  I got around that with a mixture of the Mac and PC licensing - but it made me realize the cost of dependency on an expensive package.  I don't want to be one MS policy change away from having to ration my word processing capabilities across my machines. I hear that there has been such a change (allowing one to purchase Windows OR Mac versions of Office -- not both as I did in the past) so this may be an immediate problem for some.

5] Universal output -- I use LaTeX to create PDFs.   Anyone can open them.  There is no compatibility problem with this or that version of Word.  This has been a life-saver for presentations in particular.  My PDF-based presentations appear the same regardless of the software used to view them.  I don't have to worry about presenting at conferences or other locations for invited presentations where they have an older (or newer, for that matter) version of PowerPoint that shifts objects in my slides around or interferes with the representation of equations.  Similarly, I don't have to worry about people not being able to open a document because they have an old version of Word (or no version of Word at all).

Word (and PowerPoint) allow one to print to PDF now, too.  I strongly recommend that option for presentations regardless of your choice of software.

6] Cross-platform support -- I will admit, this one is a strange reason.  I like LaTeX in that I can move sources files from one platform to another.  I can edit them on my old Macbook, my iPad, or any of my Windows machines.  I can even use online resources like ScribTeX to host and compile files completely online.  This makes LaTeX versatile in the way that Google Docs are -- with a far more flexible set of authoring options.  I hear Word is moving towards web-hosting but I don't need to bother since I have an alternative in place already. 

-1]  OK.  I admit that there have been some challenges.   The hardest challenges have been related to co-authorship.  Only one of my co-authors has ever used LaTeX.  I have had to take on the burden of translating revisions.  I did not mind this too much given that I LOATHE "track changes" in Word -- but it has created some tension.  I don't want to give the impression that the switch has all been peachy.  A couple of times (a minority of times, but a significant minority) I have had to convert a paper to Word before submission to a journal.  This is also annoying -- but I can deal with it given the advantages I experience.  For the most part, though, journals have taken my LaTeX-generated PDFs for review.  If they want me to change format after acceptance, I have no complaint.  I will convert it to Word Perfect 5.1 if that is what they want for publication.  If I were not willing to do this for a publication, I should not be publishing that article in that venue to begin with.

So, please tell me what you think.  Do you want to hear more about the features and options of LaTeX or should I just lay off it and stick to statistics software and public management research?



Sunday, October 16, 2011

A Rule of Three for Public Management Research?

Chris Achen (Princeton) was in College Station last week presenting his provocative argument that political science research needs to focus more on narrowly focused, simple tests and less on elaborate and complicated statistical models.  Somewhat hyperbolically, Achen argues that statistical models should contain no more than three independent variables ("A Rule of Three" -- ART).  Higher dimensional models, Achen suggests, have a number of problems that leave them prone to supporting frail and unreliable inferences. 

As part of the support for this claim, Achen argues that the most important discoveries of political science were derived from nothing more complicated than cross-tabs.   Sometimes, he will go so far as to say that nothing of lasting importance has been learned from more complicated statistical models.  

I have a great deal of sympathy for this approach and will discuss a few aspects of it over the coming weeks. For now, I just wanted to introduce the basic argument.  Do you think that public management research should embrace "a rule of three"? 

You can find a central statement of the critique here.

Tuesday, October 11, 2011

New Draft Paper on Public Trust in DHS

We have completed a draft of the next paper using our national opinion survey on issues related to nuclear security and homeland security. You can find it after the jump.  Comments, as always, are welcome.


Short Tutorial Resources for R and LaTeX

I am often (OK, that is an exaggeration -- let's say "occasionally") asked by people how they can get started in R or LaTeX.  One of the most useful norms surrounding open source software is the expectation that one will simply provide support and tutorials for free.  When I find effective tutorials, I will link to them here.

Today I want to make you aware of some superb short videos to help with basic functions in R and LaTeX.  Examples range (for R) from data entry to estimation of Poisson models to graphics with ggplot.  For LaTeX, examples range from basic document structure to specific tutorials on Beamer (a package to create PP-like slides) and Tikz (a package to allow you to draw a wide range of figures including trees, mind-maps, and traditional XY plots). 

To get started with these short videos, check them out at:  http://www.youtube.com/user/ramstatvid.  

Monday, October 10, 2011

Coming Soon... A New Mission for the Blog

I have decided to change (read, expand) the mission of the blog.  I will continue to post my new working papers here.  However, I will also increase the frequency with which I post information on new resources (particularly related to STATA, R, LaTeX, etc.) and opine on issues related to public management research.  I will also include posts related to job ads for public management students -- particularly those interested in emergency management, homeland security, and nonprofit management.  Watch this space!