Pandas and the Greek New Testament

This post is really starting in the middle but then why not !

I have been using a Python framework out of The Netherlands called Text-Fabric which gives one programmatic access to the ETCBC Hebrew and SBLGNT texts. It is very far off the topic of this post to discuss its features and such so I refer the curious to https://github.com/ETCBC/text-fabric/wiki which provides full instructions on how to set up the environment. Anyone not familiar with ETCBC  (Eep Talstra Centre for Bible and Computer previously known as WIVU) can learn more here http://www.godgeleerdheid.vu.nl/en/research/institutes-and-centres/eep-talstra-centre-for-bible-and-computer/index.aspx.

From this point I will assume a certain familiarity with Python, IPython, Jupyter Notebooks, Text-Fabric, hereafter TF, and less with Pandas. Told you I was starting in the middle ! So onward …

I have been reading Luke in Greek and in conjunction with that Steven Runge’s Discourse Grammar of the Greek New Testament and I was reading about δε. Now it occurred to me that it would be interesting to know how widely distributed δε is in Luke. Of course, one can do such queries in a snap in programs such as Accordance and then graph the results. Naturally I wondered if I could do an Open Source version. Last week I ran across Pandas, not the large furry black and white almost bears, but the panel data set library for data science work in Python. Reading a little on it, it occurred to me that it would be possible to combine Pandas and TF and produce a plot of the kind I was looking for. (Anyone not familiar with Pandas is going to need to quickly look over http://pandas.pydata.org/ and in particular http://pandas.pydata.org/pandas-docs/stable/10min.html. By the way, it’s not even close to just 10 minutes !)

Data Science is a huge topic which I am equally hugely unqualified to say anything much about. However Python has many cool libraries with which to do data science work, Pandas being one. Pandas is used for work similar to that often done in R. Here, its basic Series data type will be of help. In addition one can make use of matplotlib to plot the graph. TF will provide the raw data from the SBLGNT and a little bit of Python will bucket the occurrences of the word δε.

Assuming you already have a Python 3.6 setup with Jupyter the next thing to do is get Pandas and matplotlib installed into python. Use pip:

pip install pandas
pip install matplotlib

With that in place and TF installed per the wiki above, and the text-fabric-data installed and set up also we are ready to begin.

At this point the next thing you will need is the Jupyter Notebook which I githubbed here https://github.com/47rooks/bible-software-modules/blob/master/TF-Notebooks/de%20in%20Luke.ipynb. It requires 3.6 of Python for which I make no apologies. (The person who got me onto 3.6 f-strings knows who is he is 🙂 ) The notebook is pretty self-explanatory if you have done all the required reading 🙂 If not you be able to follow the trail once you have looked at a few of the sites above.

And as a little encouragement here is the final chart

47