Go to file
Joel Lonbeck 6b384a69a3 added usfm text 2018-01-01 15:24:03 -08:00
01-GEN.usfm added usfm text 2018-01-01 15:24:03 -08:00
02-EXO.usfm added usfm text 2018-01-01 15:24:03 -08:00
03-LEV.usfm added usfm text 2018-01-01 15:24:03 -08:00
04-NUM.usfm added usfm text 2018-01-01 15:24:03 -08:00
05-DEU.usfm added usfm text 2018-01-01 15:24:03 -08:00
06-JOS.usfm added usfm text 2018-01-01 15:24:03 -08:00
07-JDG.usfm added usfm text 2018-01-01 15:24:03 -08:00
08-RUT.usfm added usfm text 2018-01-01 15:24:03 -08:00
09-1SA.usfm added usfm text 2018-01-01 15:24:03 -08:00
10-2SA.usfm added usfm text 2018-01-01 15:24:03 -08:00
11-1KI.usfm added usfm text 2018-01-01 15:24:03 -08:00
12-2KI.usfm added usfm text 2018-01-01 15:24:03 -08:00
13-1CH.usfm added usfm text 2018-01-01 15:24:03 -08:00
14-2CH.usfm added usfm text 2018-01-01 15:24:03 -08:00
15-EZR.usfm added usfm text 2018-01-01 15:24:03 -08:00
16-NEH.usfm added usfm text 2018-01-01 15:24:03 -08:00
17-EST.usfm added usfm text 2018-01-01 15:24:03 -08:00
18-JOB.usfm added usfm text 2018-01-01 15:24:03 -08:00
19-PSA.usfm added usfm text 2018-01-01 15:24:03 -08:00
20-PRO.usfm added usfm text 2018-01-01 15:24:03 -08:00
21-ECC.usfm added usfm text 2018-01-01 15:24:03 -08:00
22-SNG.usfm added usfm text 2018-01-01 15:24:03 -08:00
23-ISA.usfm added usfm text 2018-01-01 15:24:03 -08:00
24-JER.usfm added usfm text 2018-01-01 15:24:03 -08:00
25-LAM.usfm added usfm text 2018-01-01 15:24:03 -08:00
26-EZK.usfm added usfm text 2018-01-01 15:24:03 -08:00
27-DAN.usfm added usfm text 2018-01-01 15:24:03 -08:00
28-HOS.usfm added usfm text 2018-01-01 15:24:03 -08:00
29-JOL.usfm added usfm text 2018-01-01 15:24:03 -08:00
30-AMO.usfm added usfm text 2018-01-01 15:24:03 -08:00
31-OBA.usfm added usfm text 2018-01-01 15:24:03 -08:00
32-JON.usfm added usfm text 2018-01-01 15:24:03 -08:00
33-MIC.usfm added usfm text 2018-01-01 15:24:03 -08:00
34-NAM.usfm added usfm text 2018-01-01 15:24:03 -08:00
35-HAB.usfm added usfm text 2018-01-01 15:24:03 -08:00
36-ZEP.usfm added usfm text 2018-01-01 15:24:03 -08:00
37-HAG.usfm added usfm text 2018-01-01 15:24:03 -08:00
38-ZEC.usfm added usfm text 2018-01-01 15:24:03 -08:00
39-MAL.usfm added usfm text 2018-01-01 15:24:03 -08:00
LICENSE initial commit 2016-06-20 23:19:18 +00:00
Project Explanation.md Project Explanation 2016-07-08 10:56:45 +00:00
README.md Update 'README.md' 2016-11-17 22:38:01 +00:00
Volunteer job description.md corrected markdown formatting errors 2016-07-08 11:07:32 +00:00
manifest.yaml added usfm text 2018-01-01 15:24:03 -08:00

README.md

UHB

The resource we are using as our UHB is the Open Scriptures Hebrew Bible. This project is the Westminster Leningrad Codex with Strongs lexical data and morphological data marked up in OSIS files.

Parsing Status

See the parsing status for the whole Old Testament. Or use the book by book links below.

Roadmap

Initial Inclusion in tC

Get tC to support OSIS XML files like https://github.com/openscriptures/morphhb/blob/master/wlc/Ruth.xml

  • Lexical data is encoded in lemma attribute, which is the word's Strongs number
  • Morph data is encoded in morph attribute, key here

May as well read the files directly from https://github.com/openscriptures/morphhb/blob/master/wlc/ unless we want to create a process to put this into our container format.

Currently, I'm only seeing about 1% of the words in those files has having morphological data.

Finishing Morphological Data

Stage 1

Write a comparer script that can verify our proposed parsings from http://hb.openscriptures.org/OshbParse/ against an existing dataset (such as https://shebanq.ancient-data.org/shebanq/static/docs/tools/shebanq/plain.html). If they check out then they can be marked as verified and included in the XML files.

Stage 2

Create a process that takes verified parsings from https://github.com/openscriptures/morphhb/blob/master/wlc/ and programmatically guess at the rest of the words in the OT (e.g. strip cantillation and find and replace for unknowns). Feed these back into the parsing system at http://hb.openscriptures.org/OshbParse/ and verify them against an existing dataset and/or Editors.

If we can make this an iterative process then we would be able to cut down the amount of manual intervention necessary to get the morph data.

Completion

After the morphology data is complete, the UHB project will effectively be completed. At the moment there are no further plans to markup the text with other information.