GitHub for Academic Research

There was an article on Slate this morning that made the argument:

We need a GitHub for academic research.

It is an interesting idea. GitHub is a repository site for software projects and their source code (see Wikipedia: GitHub). At this point, we are now going on a solid three decades of the internet. Academic listservs are more or less gone (granted, Linguistlist is still going strong, albeit in a less e-mail-oriented form. The golden age of academic blogging is mostly over. Many of us are still writing, of course. But our readers do not really socialize and interact here anymore. They’ve moved elsewhere. When I publish a post, discussion rarely appears in the comments section these days. That happens on Facebook or on Twitter now, which is fine. Some of the best academic discussion, today, now happens on’s sessions feature, where papers can be discussed in real time, usually in draft form and invitation only—you can

But none of these are what GitHub is. GitHub is a repository for code, not academic argument or discussion. GitHub is for data, not for prose.

The thrust of the argument is this:

The academic paper has some inherent limitations—chief among them that it can provide only a summary of a given research project. Even an outstanding paper cannot provide direct access to all of the research data collected or to the record of discussions among scientists that is reflected in lab notes. These windows into the messy and halting process of science, which can be extremely valuable learning objects, are not yet part of the official record of a research study.

But it doesn’t have to be this way. If we take advantage of the unique capabilities of the web to tell the full story of a research project—rather than merely using it as a faster printing press as we do today—we can build greater transparency into our approach to reporting science. Besides improving information-sharing among scientists, a push toward transparency could improve public trust in science and scientists. Now, when the very concepts of fact and truth under assault and many scientists feel compelled to march in response, is the perfect time to rethink our approach to scientific communication altogether.

A striking proposal indeed.

Now the author here, Marcus Banks, is talking about science, specifically. Most readers here likely view themselves as being more within humanities. But linguistics, even (or perhaps especially?) linguistics for an ancient language like Greek, is a data-driven discipline. Our theses and dissertations tend to be of one of two types. They are either a summary of research with an argument for a view that provides a snapshot of the data. Con Campbell’s (2007) Verbal Aspect in the Indicative Mood and Narrative is an example of this. Or they simply are data in its entirety with commentary. Douglas Huffman’s (2014) Verbal Aspect Theory and the Prohibitions in the Greek New Testament.

The data and its analysis is at least as important as the argument.

I choose these two particularly as examples for a reason. Both represents some form of the tenseless view of Greek that I find highly unconvincing. So which is more useful for me, as a researcher? Which one would I be more likely recommend to others, despite my disagreement? Is it the one that merely provides a snapshot of the data or the one that provides a comprehensive database of his analysis (albeit in print form)? Quite obviously, it is the latter.

Huffman’s monograph is of far greater value to me.[1] I can disagreement him on any number of points: his view of the status of tense in Greek, his interpretation of individual instances of prohibitions, or his categories for analyzing prohibitions. But despite that, I can always come back to the volume to see what his opinion is on whatever prohibition I’m looking at. You cannot do that with the other approach, the summary approach. Books that simply make an argument based on a summary/snapshot of the data tend to get read once. I read book of this type and I either agree with it or disagree with it. If I disagree with the argument or conclusion, then the book has little use to me afterward. We are doing language work. The data and its analysis is at least as important as the argument.

But Huffman’s data is still merely a print source. It is not searchable, it can’t be manipulated to be visualized in different ways. It exists merely as a list. This has historically be the challenge for biblical studies. The print concordance is the original database for our work.

We need to be digitizing our research, especially if it’s already published somewhere. Some of us already are. A few of my research projects are published with Logos Bible software, such as my semantic role/argument structure analysis of New Testament verbs. But I need to be better at this, too, especially for my non-commissioned/contracted  (i.e. personal) projects. Creating consistently annotated data is time consuming. Often it is easier in the moment to do the analysis token by token in my head without actually writing it down. When you are looking at 10,000 instances of something, the extra 20 second it takes to type the analysis adds a lot of time to project that probably already feel like they are moving too slowly.

Documentation is just as important as the final project. Long term, It is probably more important.

The annotated database of Greek perfects for my thesis is sadly probably only 2/3’s filled in, even though I checked everything. And now the thought of going back now feels worse.

I should probably be putting my personal projects upon GitHub, though (until there’s an academic alternative). Even in partially completed form, documentation is just as important as the final project. Long term, It is probably more important. If I want to take my grammar project seriously, it needs to be more than just prose. It needs to be data, too. And that data needs to be accessible. Otherwise, it’s useless.

[1] I should emphasize at this point that I still value Campbell’s work. Even with my disagreements, he has made some excellent contributions also. It is simply that on the practical level of usefulness, having the complete and fully annotated data creates value in a way that summary and prose to do not. In fact, in the reverse, data without a prose summary would be nearly as useless.

Works cite:

Campbell, Constantine. 2007. Verbal Aspect in the Indicative Mood and Narrative. New York: Peter Lang.

Huffman, Douglas. 2014. Verbal Aspect Theory and the Prohibitions in the Greek New Testament. New York: Peter Lang

6 thoughts on “GitHub for Academic Research

Add yours

    1. Well, this isn’t really the point. The point is:

      “We need a place to publish data publicly.”

      Not: for-profit organizations are bad. Though on that point, as it stands a year from that post you linked to, hasn’t gone in any of those directions. They’ve focused on their efforts on selling additional services to scholars instead.

      PS – I’m tempted to just reply on twitter…

  1. Hey, I’m not sure how much you’ve used GitHub since this post, but it’s actually precisely for prose. Data and code are distinct.

    Data, for instance, is raw (just numbers), it would be saved in an array and code (which is prose that tells the computer to act) interprets the data…or rather does things to it so that a human can interpret it. And documentation is prose that humans can quickly read in order to know what the code does.

    I hope that explanation (which is unnecessary now that I think about it) is helpful.

    My point is that GitHub really would be perfect for what you’re talking about. In fact, it’s becoming common for open source textbooks to be hosted on Gits. Check it out:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by

Up ↑

%d bloggers like this: