Wednesday, May 9, 2012

Norms of Attribution for Data

I am steadily accumulating windmills against which I tilt.  The one I want to bring to public management scholars attention today is attribution for data.

We have strong norms for ensuring attribution of theoretical advances.  Our citation lists overflow with recognition of previous academic publications.  Professional advancement is driven, in part, by the process of getting recognized for these sorts of publications.  However, we could not write many of the articles we do without data collected by others.  It is time data sets get the recognition they deserves.




Think of the influence that some key data sets have had in the field of public management.  The Texas schools data is a source of consternation among many because it has been used so many times.   The NASP series of surveys and Wright's state surveys have similarly been the source of many, many articles.  Our field is better for their existence.

Yet, we don't give credit for the data itself.  Some of the credit is caught by norms of citing some of the early articles using the data set in any publication using the data set -- but this is inefficient and not entirely transparent.  It would be clearer if we simply acknowledged our debt to external data sets when we use them.

In the end, this is about incentives.  If we want to incentivize the creation of broadly useful data sets, we need to reward their creation.  Our disciplinary system of recognition and reward is through citation.  It seems apparent that we should reward and recognize the data set within the citation norms by simply including the data as a cited reference.

Properly rewarding data sets may even convince people to release their data more broadly.  I have been disappointed when I have asked some public management scholars for data for class demonstrations.  Often, data only a few years old is unavailable.  If the norm became one of open archiving, this would not be a problem.  I don't blame these scholars or attribute any malfeasance to them.  Archiving is simply not a common practice in our field.  Until recently, there were few tools to make this easy for people.  That is changing, thankfully.

There are norms and tools in place to start this today.  For more information than I can cram into a blog post, I point you to Gary King's Dataverse project.  Overall, the project is designed to ensure controlled access and persistent availability of data.  You can create your own project page through which to distribute data.  The process assists with creating a reliable data set (with a format that will outlive, say, STATA 12 or STATA Corp) and consistent citation information.

Now I just have to convince my colleagues to do the same...

PS AJPS now requires that data for articles published there go to a dataverse page of their own.  Maybe we can use this to convince JPART to require the same thing.

No comments:

Post a Comment