Some Initial Musings on the Workshop at the Lorentz Center

There were a number of major themes that were regularly touched on, including the issue of open source (or at least open access) for data sets and materials for linguistic, text critical, and other forms of analysis, the question of how to integrate the computer technology in the biblical studies classroom, and then the future of linguistic databases (e.g. Hebrew & Greek syntax treebanks).

  1. The open source/open access issue is an interesting one, simply because of the fact that someone needs to pay something for major Greek and Hebrew linguistic projects to move forward. I’m not still not sure how these kinds of large projects can work without funding and most of time that funding is going to come with limitations on the openness of the data. Likewise, those who work on projects on the side without funding (e.g. my own computational Greek project) tend to be less inclined to simply sharing their work because they know all of the effort that went into it. On top of that, there is the simple fact that there are very, very few people that I would trust with my data projects–and most of those people were in attendance at the workshop. My data is my baby. How do I know it will be safe out in the wild? I’m not against sharing it, but it would definitely have to be with the right people for the time being. I don’t know. Perhaps my reluctance has more to do with the (relatively) nascent state of much of my data, which when combined with the fact that I can count on my hands and feet the number of Greek scholars who do the work I do, makes me slightly uncomfortable.
  2. I can’t say much about the teaching question. I have tutored biblical languages individually, but the only teaching I’ve done has been in linguistic classes and the software question is not really an issue. With that said, I would be curious about how others have or have not integrated software and computers into their teaching of Greek and Hebrew. Any comments from the audience?
  3. One of the major thrusts of the third theme centered around how such databases will connect with other sub-fields of biblical studies. And on that front, I’m quite excited to see what happens. I’m not a biblical scholar; I’m a linguist, but there were several exciting projects on the horizon that should be very good should they come into fruition, particularly in conjunction with textual critics (perhaps more on that later). As related to point #1, I would be interested in data sharing on that front.

There is more to come. I’ll be writing up some musing on specific sessions and topics in the coming week or so, but there are a few beginning thoughts.

11 thoughts on “Some Initial Musings on the Workshop at the Lorentz Center

Add yours

  1. What kind of data sets were under discussion? Most text corpora are already open access I think.

    In what ways do your fear your data could be in danger if you did open access it? If there are right people to share it with what makes everyone else wrong?

    1. Most linguistic databases for Greek and Hebrew are definitely not open access. There are a couple that are liberally licensed, but nothing more.

      As for my data, as I said, its a matter of trust. I’m not going to let just anyone babysit my child. As I said, I’m not against sharing it, its just still way underage.

  2. I’ve been mulling over the open-source question for a while myself. My thinking (right now, at least) is that we ought to pursue the development of open-source texts and open-source resources for the creation of annotations and other related resources. Interested scholars could use the texts and resources to create public (i.e. collaborative) or (semi-)private databases, and any private research could then be released to the public whenever its author(s) were prepared to permit widespread use.

    Having the texts and development tools open-source and freely available is the important thing, I think, because it permits independently produced resources to interact. We need to avoid data silos as much as possible.

    Of course, that still leaves open the issue of funding and the willingness of people to assist in the development of such tools. I suppose the best chance of success will depend on keeping the doors as wide as possible to include different fields of study and different linguistic approaches.

    1. This is very much in the vein of our discussions at the conference, though there wasn’t too much consensus on what the future will look like.

      Generally speaking, until the funding question can be answered, I don’t know what else can be done, though there has been some progress. Most open source discussions don’t say much about the money issue.

      For my own part, I’m far more interested in sharing data for the purposes of larger project collaboration where there is a clear and set purpose and goal. Sharing for the purpose of sharing is less interesting to me–though one exception might be in a situation where I have quit a project and the data is simply sitting around doing nothing and someone else could use it more profitably than I could. But I’ve only been doing this for 6 years. I really don’t have data like that.

  3. Mike, I’m still confused by what you mean by “trust”, and I don’t get the babysitting metaphor. Do you mean that it’s not ready for contributions from others? Are you worried about people making invalid conclusions from your data?

    As to the funding, why is it such a problem? Institutes generally have no trouble with their staff publishing in open access journals, or for open access books (if you could find a willing publisher). Are the restrictions greater for other types of work outcomes?

    The solution for the funding issue is that the culture needs to be changed, from the bottom. Encourage your students, from the undergrad to doctorate level about the values of open access, and to always seek that path if they can. And once people from that generation start getting on grant committees then it will be easier for everyone.

    1. It’s partially that its not ready for contributions from others–its not sufficiently complete for one thing.

      The funding question involves large scale projects. It’s a matter of size: a article or even a book can be completed relatively easily over the course of a short period of time. The kinds of projects involved here are of a far larger scale that aren’t going to be completed on a sabbatical and unless an incredibly good grant is approved, no scholar or even group of scholars is going to be in a situation to do the work in their free time.

      Beyond that, it is my observation that the vast majority (with a few exceptions) of those who speak most openly about open access and open source projects like this either:

      1) Don’t actually have anything of their own to share.
      2) Are you interested in others sharing work than sharing their own.

      Which brings me back to my original point: I’m all for sharing, but is sharing merely for the sake of sharing that great of an idea? Whoever I share with, shouldn’t it be a mutually beneficial relationship?

    2. In regard to funding I can think of two questions: Is the programming task actually of academic merit? If it is, is it recognised? If the answers are yes and no, then that’s a problem, but it will take time to change the culture of the academy. But it can happen. Pure linguistic documentation is getting more and more support for academic merit. If the answer to the first is no, then it will be hard to get academics to work on it, but there may be others with the expertise and interest.

      Hmm. I need to put some more of my work on the net too, even though it is very incomplete. Ok, I will today.

      Is sharing for the sake of sharing a great idea? Of course! Scholarship should be serving society as a whole. If you receive public funding I think it is reprehensible to put the results of your work behind a paywall. If you receive private funding then the situation is different, but it would still be good to aim to share everything you’re allowed to. Christian scholars have the additional purpose of serving the church, and should be even more supportive of open access than the average person. This is especially the case because Christian institutions in both the developing world and the first world usually have small enrolment numbers and less money to put towards database subscriptions, and there are many Christians who could benefit from research who aren’t associated with any institution. If research is worth doing then it’s worth sharing. To expect a mutually beneficial relationship is the opposite of servanthood.

      1. On your first paragraph: I’m not talking about programming–though that is involved. Linguistic analysis is more and more does is database work. I’m talk about something like the Cascadia Syntax Graphs or the WIVU database ( The latter is available online with registration, but support and funding is provided by the Netherlands government and other organizations(as I understand it–it’s probably far more complicated than that). There’s a extremely large amount of overhead for both the creation of WIVU, its maintenance, and its archiving.

        On your last paragraph: To the extent that what you describe refers to a final product, I am in 100% agreement–in fact I’d be perfectly fine with any thing I publish going into the public domain a decade after after it was realized (the way the concept of copyright was originally supposed to work). But that’s not what I’m talking about. I’m talking about raw, messy, unfiltered data that I probably would never want to have published anyway. And that’s what we were talking about sharing at this workshop. Now if I quit working on something and saw that someone else could use it or build on it and knew that I’d never do anything with it, that would be a completely different story.

        1. Hmm, just alter my questions to this then: Is the task of syntax tagging actually of academic merit? If it is, is it recognised?

          When I have better internet I’ll take a look at WIVU so that I’ll have a better idea of what you’re talking about.

        2. Well, because of copyright restrictions, you may or may not be able to access WIVU. You must first apply for approval and you can get a two year access period–I don’t know how long that takes. I’ve never done it.

          It’s basically the Hebrew Bible equivalent of the Penn State Treebank for English:

          I know that linguists view treebanks as having academic merit, at the very least.

    3. I’m just guessing here, but I suspect that many of us have reservations about sharing data because there are innovative theoretical and descriptive categories wrapped up in that data. It’s not as though we’re just slogging through lots of text, working with a standard set of categories (as is the case with something like the Son of Suda Online). We’re working with texts and formulating categories at the same time. This makes it much harder to be collaborative, because collaboration typically requires some consensus. It also makes it harder to be open-handed with work in progress, because theoretical and descriptive categories need to be presented carefully in order to be understood and accepted.

      Having said all that, I still think that collaboration and open distribution are vitally important, and that open-source resources are as well. Eventually, everything I’m working on will need to see the light of day – or else it’s all for nothing. When that day comes, I’d like my data to be in a format that is ready for open distribution, integration with other data sets, and even alteration at the hands of others with new ideas.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by

Up ↑

%d bloggers like this: