Journaling Clojure With Clojure

I want to keep a coding journal, showing how I work through a project over time, building up code and watching how changes get identified and made. This is the proof of concept where I develop the technique on the script that generates the journal. We’re going to go on a walkthrough of the development of this page…

Commit 1

diff --git a/project.clj b/project.clj
index 610dbbb..6123dd2 100644
--- a/project.clj
+++ b/project.clj
@@ -4,7 +4,9 @@
   :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
             :url "https://www.eclipse.org/legal/epl-2.0/"}
   :dependencies [[org.clojure/clojure "1.11.1"]
-                 [org.eclipse.jgit/org.eclipse.jgit "6.7.0.202309050840-r"]]
+                 ;; clj-jgit pulls in an older library
+                 #_[org.eclipse.jgit/org.eclipse.jgit "6.7.0.202309050840-r"]

JGit has an unhelpful interface with lots of iterators or something. It scared me off and I ended up using clj-jgit (Clojure wrapper aroung JGit) - mainly because clj-jgit gave me a sequence of commits easily.

+                 [clj-jgit "1.0.2"]]
   :main ^:skip-aot git-blog-clj.core
   :target-path "target/%s"
   :profiles {:uberjar {:aot :all

Commit 2

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
new file mode 100644
index 0000000..e9109b8
--- /dev/null
+++ b/src/git_blog_clj/core.clj
@@ -0,0 +1,32 @@
+(ns git-blog-clj.core
+  (:gen-class)
+  (:require
+   [clj-jgit.internal :as gi]
+   [clj-jgit.porcelain :as gp]
+   [clj-jgit.querying :as gq]))
+
+(defn -main
+  "I don't do a whole lot ... yet."
+  [& _args]
+  (println "Hello, World!"))
+
+(def r
+  "
+  This repo, the one for this project. The local `.git` in the same root as
+  `project.clj`
+  "
+  (gp/load-repo ".git"))
+
+(gi/resolve-object r "26ced4b9468d769a347102684e6b5513ee0d37a7")
+
+(println

This is the tail end of experiments in clj-jgit and JGit. It was (gq/rev-list r) that convinecd me in the end. I couldn’t see a neat way of doing that following JGit tutorials.

+ (gq/changed-files-with-patch
+  r
+  (second
+   ;; Turns out to be a poor man's rev-list
+   (keys
+    (gq/build-commit-map r
+                         (gi/new-rev-walk r))))))
+
+(run! println
+      (map gq/changed-files-with-patch (repeat r) (gq/rev-list r)))

Commit 3

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index e9109b8..c4bef34 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -1,9 +1,9 @@
 (ns git-blog-clj.core
   (:gen-class)
   (:require
-   [clj-jgit.internal :as gi]
    [clj-jgit.porcelain :as gp]
-   [clj-jgit.querying :as gq]))
+   [clj-jgit.querying :as gq]
+   [clojure.string :as string]))
 
 (defn -main
   "I don't do a whole lot ... yet."
@@ -17,16 +17,70 @@
   "
   (gp/load-repo ".git"))
 
-(gi/resolve-object r "26ced4b9468d769a347102684e6b5513ee0d37a7")
+(def repo-data
+  (->>
+   (map gq/changed-files-with-patch (repeat r) (gq/rev-list r))
+   (interpose "\n\n\n")
+   (apply str)
+   string/split-lines))

 
-(println
- (gq/changed-files-with-patch
-  r
-  (second
-   ;; Turns out to be a poor man's rev-list
-   (keys
-    (gq/build-commit-map r
-                         (gi/new-rev-walk r))))))
+(spit "my-text.txt"
+      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\nHehe")

The syntax doesn’t change much from this. | to mark lines as something to match against lines from git, and otherwise freeform test.

 
-(run! println
-      (map gq/changed-files-with-patch (repeat r) (gq/rev-list r)))
+(def markup-data
+  (->> (slurp "my-text.txt")
+       string/split-lines))
+
+(defn output
+  [markup-data repo-data]
+  (loop [acc
+         []
+
+         m
+         markup-data
+
+         r
+         repo-data]
+    (let [[m' & m+]
+          m
+
+          [r' & r+]
+          r
+
+          matchable-m?
+          (= (first m') \|)]
+      (cond

I get the feeling this layout is a mistake. Technically it might be acceptable but it is difficult to read the logic out of this conde structure. The inspiration I’m working from is a merge join which worked quite well as a core algorithm. That seemed fine. The problem is in the cond’s lack of clarity. In situations like this I usually recommend using state machines and I will remember my own advice in a few commits.

+        ;; Case 0; we're finished. Return the accumulator
+        (and (empty? r) (empty? m))
+        acc
+
+        ;; Case 1; We've consumed r -> keep consuming m
+        (empty? r)
+        (recur (conj acc m') m+ r)
+
+        ;; Case 2; We've consumed m -> keep consuming r
+        (empty? m)
+        (recur (conj acc r') m r+)
+
+        ;; Case 3; we're adding lines from the markup file until we find a new thing to match on.
+        (not matchable-m?)
+        (recur
+         (conj acc m') m+ r)
+
+        ;; Case 4; we're looking for a match and find one.
+        (= r' (string/replace-first m' "|" ""))
+        (recur (conj acc r') m+ r+)
+
+        ;; Case 5; we're waiting for a match but can't possibly find it. Dump
+        (empty? r)
+        (recur (conj acc m') m+ r)
+
+        ;; Case 6; we're waiting for a match and don't see it yet.
+        :else
+        (recur (conj acc r') m r+)))))
+
+(spit "out.txt"
+      (->>
+       (output markup-data repo-data)
+       (interpose "\n")
+       (apply str)))

Commit 4

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index c4bef34..6876ed6 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -20,12 +20,13 @@
 (def repo-data
   (->>
    (map gq/changed-files-with-patch (repeat r) (gq/rev-list r))
+   reverse
    (interpose "\n\n\n")
    (apply str)
    string/split-lines))
 
 (spit "my-text.txt"
-      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\nHehe")
+      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\n|+(def repo-data\n\nThis was backwards. Dunno if it is consistently backwards or not though.\n\n||FIN.\n\n")
 
 (def markup-data
   (->> (slurp "my-text.txt")

Commit 5

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index 6876ed6..cefca26 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -21,12 +21,15 @@
   (->>
    (map gq/changed-files-with-patch (repeat r) (gq/rev-list r))
    reverse
-   (interpose "\n\n\n")
+   (interpose "\n```\n\n\n# Next Commit\n```diff\n")

Quickly the formatting gets more complicated. This is weaknesses here quickly became clear - this is trying to work out where to put the backticks by considering the data as a stream of lines - when just 2 lines higher the commits are still organised as commits. I figure out the sensible approach in a few commits and wrap in a better way.

+   reverse
+   (into ["```"])
+   reverse
    (apply str)
    string/split-lines))
 
 (spit "my-text.txt"
-      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\n|+(def repo-data\n\nThis was backwards. Dunno if it is consistently backwards or not though.\n\n||FIN.\n\n")
+      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\nChecking the logic\n|+(def repo-data\n\nThe order of the commits was backwards. Dunno if it is consistently backwards or not though.\n\n||FIN.\n\n")
 
 (def markup-data
   (->> (slurp "my-text.txt")
@@ -64,13 +67,17 @@
         (recur (conj acc r') m r+)
 
         ;; Case 3; we're adding lines from the markup file until we find a new thing to match on.
+        ;; Case 3+; We're just about to finish case 3, make sure to get the ``` correct.
         (not matchable-m?)
-        (recur
-         (conj acc m') m+ r)
+        (if (= (ffirst m+) \|)

The cond is rapidly falling apart. This type of if-in-cond is awkward and one of my triggers to look for alternative code layouts. Again, the real question here is where the code blocks should go and trying to answer that while working in a line-by-line framework isn’t easy.

+          (recur
+           (into acc [m' "" "```diff"]) m+ r)
+          (recur
+           (conj acc m') m+ r))
 
         ;; Case 4; we're looking for a match and find one.
         (= r' (string/replace-first m' "|" ""))
-        (recur (conj acc r') m+ r+)
+        (recur (into acc [r' "```"]) m+ r+)
 
         ;; Case 5; we're waiting for a match but can't possibly find it. Dump
         (empty? r)

Commit 6

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index cefca26..8fdd170 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -17,19 +17,38 @@
   "
   (gp/load-repo ".git"))
 
+(def skip-commits
+  #{"5ec8855d6c3455087eb556578a07892f8005ad94"})
+
+(defn banned-sha1-hash?
+  [^org.eclipse.jgit.revwalk.RevCommit rev]
+  (->> rev
+       .getName
+       (contains? skip-commits)))
+
+(defn to-diff-block

This is the part of writing code that I probaly enjoy the most. Carving out chunks of logic from a less organised mass. On second pass I’ve moved a level of abstraction up and am dealing with an entire commit (it is wrong to call it a file) to add code block markers.

+  [idx file]
+  (->>
+   (concat ["" (str "# Commit " (inc idx)) "```diff"] file ["````" "" ""])
+   (into [])))
+
 (def repo-data
   (->>
-   (map gq/changed-files-with-patch (repeat r) (gq/rev-list r))
+   (gq/rev-list r)
+   (remove banned-sha1-hash?)

If you look up the commit - this was the one that added the license file. I wouldn’t even give the GPL that much space on this page, let alone the Eclipse license!

    reverse
-   (interpose "\n```\n\n\n# Next Commit\n```diff\n")
-   reverse
-   (into ["```"])
-   reverse
-   (apply str)
-   string/split-lines))
+   (map gq/changed-files-with-patch (repeat r))
+   ;; `gq/changed-files-with-patch` does not return strings, it evaluates to
+   ;; some sort of quasi-string object that breaks split-lines, somehow. Java
+   ;; folk, at it again with their wacky ideas!
+   (map str)
+   (filter seq)
+   (map string/split-lines)
+   (map-indexed to-diff-block)
+   flatten))
 
 (spit "my-text.txt"
-      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\nChecking the logic\n|+(def repo-data\n\nThe order of the commits was backwards. Dunno if it is consistently backwards or not though.\n\n||FIN.\n\n")
+      "Nothing\n\nYet\n|   :main ^:skip-aot git-blog-clj.core\nChecking the logic\n|+(def repo-data\n\nThe order of the commits was backwards. Dunno if it is consistently backwards or not though.\n\n||FIN.\n\n")
 
 (def markup-data
   (->> (slurp "my-text.txt")
@@ -40,6 +59,9 @@
   (loop [acc
          []
 
+         state ; free in-diff

Not a moment too soon. A 2-state state machine. This tracks if we are interrupting a commit or not. If I expand this project at all there’ll probably be more states and - ideally - the entire cond logic can be refactored with states. It is easier to come back to code with named states.

The state logic causes the code block insertions to happen in a different part of the code - I thought that was an improvement.

+         :free
+
          m
          markup-data
 
@@ -60,32 +82,33 @@
 
         ;; Case 1; We've consumed r -> keep consuming m
         (empty? r)
-        (recur (conj acc m') m+ r)
+        (recur (conj acc m') :free m+ r)
 
         ;; Case 2; We've consumed m -> keep consuming r
         (empty? m)
-        (recur (conj acc r') m r+)
+        (recur (conj acc r') :free m r+)
 
         ;; Case 3; we're adding lines from the markup file until we find a new thing to match on.
-        ;; Case 3+; We're just about to finish case 3, make sure to get the ``` correct.
         (not matchable-m?)
-        (if (= (ffirst m+) \|)
-          (recur
-           (into acc [m' "" "```diff"]) m+ r)
-          (recur
-           (conj acc m') m+ r))
+        (recur
+         (conj acc m') state m+ r)
 
-        ;; Case 4; we're looking for a match and find one.
+        ;; Case 4; we're looking for a match and find one. Since we matched a
+        ;; line of code we must be interrupting a diff.
         (= r' (string/replace-first m' "|" ""))
-        (recur (into acc [r' "```"]) m+ r+)
+        (recur (into acc [r' "```"]) :in-diff m+ r+)
 
         ;; Case 5; we're waiting for a match but can't possibly find it. Dump
         (empty? r)
-        (recur (conj acc m') m+ r)
+        (recur (conj acc m') state m+ r)
 
         ;; Case 6; we're waiting for a match and don't see it yet.
         :else
-        (recur (conj acc r') m r+)))))
+        (let [new-items
+              (if (= state :in-diff)
+                ["" "```diff" r']
+                [r'])]
+          (recur (into acc new-items) :free m r+))))))
 
 (spit "out.txt"
       (->>

Commit 7

diff --git a/project.clj b/project.clj
index 6123dd2..299d079 100644
--- a/project.clj
+++ b/project.clj
@@ -4,6 +4,8 @@
   :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
             :url "https://www.eclipse.org/legal/epl-2.0/"}
   :dependencies [[org.clojure/clojure "1.11.1"]
+                 [org.clojure/tools.cli "1.0.219"]
+                 [org.slf4j/slf4j-nop "2.0.9"]

                  ;; clj-jgit pulls in an older library
                  #_[org.eclipse.jgit/org.eclipse.jgit "6.7.0.202309050840-r"]
                  [clj-jgit "1.0.2"]]
diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index 8fdd170..4919558 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -3,12 +3,8 @@
   (:require
    [clj-jgit.porcelain :as gp]
    [clj-jgit.querying :as gq]
-   [clojure.string :as string]))
-
-(defn -main
-  "I don't do a whole lot ... yet."
-  [& _args]
-  (println "Hello, World!"))
+   [clojure.string :as string]
+   [clojure.tools.cli :refer [parse-opts]]))

Considered a few options from the Clojure Toolbox. This one seemed about the right level of difficulty. I was very happy with it, 10/10 would parse-opts again. To run with -h the invocation is lein run -- -h.

 
 (def r
   "
@@ -32,7 +28,8 @@
    (concat ["" (str "# Commit " (inc idx)) "```diff"] file ["````" "" ""])
    (into [])))
 
-(def repo-data
+(defn repo-data
+  [r]
   (->>
    (gq/rev-list r)
    (remove banned-sha1-hash?)
@@ -110,8 +107,25 @@
                 [r'])]
           (recur (into acc new-items) :free m r+))))))
 
-(spit "out.txt"
+(def cli-options
+  [[nil "--exclude-commits FILE" "File of commit hashes (one per line) to exclude from the markdown generated."]
+   [nil "--repo PATH" "Path to the root of your git repo"]
+   [nil "--journal FILE" "The journal file"]
+   ["-h" "--help"]])
+
+(defn -main
+  "I don't do a whole lot ... yet."
+  [& args]
+  (let [{:keys [options summary]}
+        (parse-opts args cli-options)]
+
+    (if (:help options)
+      (do
+        (println "Options:")
+        (println summary))
+
       (->>
-       (output markup-data repo-data)
+       (output markup-data (repo-data r))
        (interpose "\n")
-       (apply str)))
+       (apply str)
+       println))))

Commit 8

diff --git a/doc/example.journal b/doc/example.journal

There isn’t a lot for the next few commits. Setting up command line options is a little tedious.

new file mode 100644
index 0000000..5553ea0
--- /dev/null
+++ b/doc/example.journal
@@ -0,0 +1,10 @@
+Nothing
+
+Yet
+|   :main ^:skip-aot git-blog-clj.core
+Checking the logic
+|+(def repo-data
+
+The order of the commits was backwards. Dunno if it is consistently backwards or not though.
+
+||FIN.
diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index 4919558..6269067 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -3,6 +3,7 @@
   (:require
    [clj-jgit.porcelain :as gp]
    [clj-jgit.querying :as gq]
+   [clojure.java.io :as io]
    [clojure.string :as string]
    [clojure.tools.cli :refer [parse-opts]]))
 
@@ -44,13 +45,6 @@
    (map-indexed to-diff-block)
    flatten))
 
-(spit "my-text.txt"
-      "Nothing\n\nYet\n|   :main ^:skip-aot git-blog-clj.core\nChecking the logic\n|+(def repo-data\n\nThe order of the commits was backwards. Dunno if it is consistently backwards or not though.\n\n||FIN.\n\n")
-
-(def markup-data
-  (->> (slurp "my-text.txt")
-       string/split-lines))
-
 (defn output
   [markup-data repo-data]
   (loop [acc
@@ -117,7 +111,15 @@
   "I don't do a whole lot ... yet."
   [& args]
   (let [{:keys [options summary]}
-        (parse-opts args cli-options)]
+        (parse-opts args cli-options)
+
+        journal-data
+        (if (:journal options)
+          (->> options
+               :journal
+               io/reader
+               line-seq)
+          [])]
 
     (if (:help options)
       (do
@@ -125,7 +127,8 @@
         (println summary))
 
       (->>
-       (output markup-data (repo-data r))
+       (repo-data r)
+       (output journal-data)
        (interpose "\n")
        (apply str)
        println))))

Commit 9

diff --git a/doc/commits.exclude b/doc/commits.exclude
new file mode 100644
index 0000000..771d83d
--- /dev/null
+++ b/doc/commits.exclude
@@ -0,0 +1 @@
+5ec8855d6c3455087eb556578a07892f8005ad94
diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index 6269067..2e7c246 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -7,18 +7,8 @@
    [clojure.string :as string]
    [clojure.tools.cli :refer [parse-opts]]))
 
-(def r
-  "
-  This repo, the one for this project. The local `.git` in the same root as
-  `project.clj`
-  "
-  (gp/load-repo ".git"))
-
-(def skip-commits
-  #{"5ec8855d6c3455087eb556578a07892f8005ad94"})
-
 (defn banned-sha1-hash?
-  [^org.eclipse.jgit.revwalk.RevCommit rev]
+  [skip-commits ^org.eclipse.jgit.revwalk.RevCommit rev]
   (->> rev
        .getName
        (contains? skip-commits)))
@@ -30,10 +20,10 @@
    (into [])))
 
 (defn repo-data
-  [r]
+  [r skip-commits]
   (->>
    (gq/rev-list r)
-   (remove banned-sha1-hash?)
+   (remove (partial banned-sha1-hash? skip-commits))
    reverse
    (map gq/changed-files-with-patch (repeat r))
    ;; `gq/changed-files-with-patch` does not return strings, it evaluates to
@@ -113,13 +103,21 @@
   (let [{:keys [options summary]}
         (parse-opts args cli-options)
 
+        skip-commits
+        (try (->> options :exclude-commits io/reader line-seq (reduce conj #{}))
+             (catch Exception _e #{}))
+
+        repo
+        (try (-> options :repo (str ".git") gp/load-repo (repo-data skip-commits))
+             (catch Exception _e []))
+
         journal-data
-        (if (:journal options)
+        (try
           (->> options
                :journal
                io/reader
                line-seq)
-          [])]
+          (catch Exception _e []))]
 
     (if (:help options)
       (do
@@ -127,7 +125,7 @@
         (println summary))
 
       (->>
-       (repo-data r)
+       repo
        (output journal-data)
        (interpose "\n")
        (apply str)

Commit 10

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index 2e7c246..7bc3ca1 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -5,7 +5,10 @@
    [clj-jgit.querying :as gq]
    [clojure.java.io :as io]
    [clojure.string :as string]
-   [clojure.tools.cli :refer [parse-opts]]))
+   [clojure.tools.cli :refer [parse-opts]])
+  (:import
+   [org.eclipse.jgit.api Git]
+   [org.eclipse.jgit.revwalk RevCommit]))
 
 (defn banned-sha1-hash?
   [skip-commits ^org.eclipse.jgit.revwalk.RevCommit rev]
@@ -13,11 +16,30 @@
        .getName
        (contains? skip-commits)))
 
-(defn to-diff-block
-  [idx file]
-  (->>
-   (concat ["" (str "# Commit " (inc idx)) "```diff"] file ["````" "" ""])
-   (into [])))
+(defn revcommit->lines
+  [^Git repo idx ^RevCommit rev-commit]

It took a few goes to get this function signature, but it seems like the proper approach is to map the transform from git object to text.

+  (let [commit-text
+        (->>
+         rev-commit
+         (gq/changed-files-with-patch repo)
+         ;; `gq/changed-files-with-patch` does not return strings, it evaluates to
+         ;; some sort of quasi-string object that breaks split-lines, somehow. Java
+         ;; folk, at it again with their wacky ideas!
+         str)]
+
+    (if (seq commit-text)
+      (as-> commit-text $
+        (string/split-lines $)
+        (concat [""
+                 (str "# Commit " (inc idx))
+                 "```diff"]
+                $
+                ["````"
+                 (str "> Commit hash " (.getName rev-commit))

For ease of excluding commits. We’re building up quite a few lines in this object that aren’t code. This is going to put more pressure on the cond statement. If the journal tries to match on the “Commit hash” line to add some text after a commit then the cond will be putting code blocks in unintended places.

+                 ""
+                 ""])
+        (into [] $))
+      [])))
 
 (defn repo-data
   [r skip-commits]
@@ -25,14 +47,7 @@
    (gq/rev-list r)
    (remove (partial banned-sha1-hash? skip-commits))
    reverse
-   (map gq/changed-files-with-patch (repeat r))
-   ;; `gq/changed-files-with-patch` does not return strings, it evaluates to
-   ;; some sort of quasi-string object that breaks split-lines, somehow. Java
-   ;; folk, at it again with their wacky ideas!
-   (map str)
-   (filter seq)
-   (map string/split-lines)
-   (map-indexed to-diff-block)
+   (map-indexed (partial revcommit->lines r))
    flatten))
 
 (defn output

Commit 11

diff --git a/src/git_blog_clj/core.clj b/src/git_blog_clj/core.clj
index 7bc3ca1..c0b80b2 100644
--- a/src/git_blog_clj/core.clj
+++ b/src/git_blog_clj/core.clj
@@ -34,7 +34,7 @@
                  (str "# Commit " (inc idx))
                  "```diff"]
                 $
-                ["````"

Oops. Emacs tries to be helpful and inserts 2 ` when one is typed. Usually that helps.

+                ["```"
                  (str "> Commit hash " (.getName rev-commit))
                  ""
                  ""])
@@ -47,10 +47,33 @@
    (gq/rev-list r)
    (remove (partial banned-sha1-hash? skip-commits))
    reverse
+   rest ; diff of first commit is ""

We’re making a lot of assumptions here - small, linear commit history. First commit blank. How this goes in practice is uncertain but I think usually my projects have quite linear histories.

    (map-indexed (partial revcommit->lines r))
    flatten))
 
-(defn output

Amazingly, this next function isn’t in the Clojure standard library. The first attempt was with partition-by but that puts the lines starting with | into their own partition. This would make a good transducer (transducers are great, I’ve been using them a lot when memory use becomes a factor and they simplify lazy sequence processing a lot) but I don’t think this operation comes up that often.

+(defn partition-when
+  "
+  Breaks a coll up into partitions, starting a new partition for each item where
+  `pred?` is `true`
+  "
+  [pred? coll]
+  (cond
+    (empty? coll)
+    []
+
+    (-> coll count (= 1))
+    [(vec coll)]
+
+    :else
+    (let [partition-when'
+          (fn [acc itm]
+            (if (pred? itm)
+              (conj acc [itm])
+              (conj (vec (butlast acc)) (conj (last acc) itm))))]
+      (reduce partition-when' [[(first coll)]] (rest coll)))))
+
+(defn journal-repo-merge
+  "Takes the journal file, and a custom data structure (as lines of text)"
   [markup-data repo-data]
   (loop [acc
          []
@@ -58,19 +81,25 @@
          state ; free in-diff
          :free
 
-         m
-         markup-data
+         ;; Break journal up into blocks, where the first line of the block is what is to be matched
+         ;; Eg: (= m' ["|match me" "Comments" "Other comments"])
+         ;; Unfortunately this means m' and r' are different things.
+         [m' & m+ :as m]
+         (partition-when #(= (first %) \|) markup-data)

This is quite ugly. It is bad form for r' and m' to be radically different objects. But the logic became much neater. I was expecting to use more state machines here but in the end using a different data structure was the major change.

As can be seen in the next few lines, the cond itself is not a bit simpler with only 5 cases, 3 of which are trivial edge cases for empty arguments. It adds entire journal entries at once which helps keep the merge logic readable.

 
-         r
+         [r' & r+ :as r]
          repo-data]
-    (let [[m' & m+]
-          m
+    (let [;; Only use this if we consume from r
+          new-state
+          (cond
+            (= r' "```diff")
+            :in-diff
 
-          [r' & r+]
-          r
+            (= r' "```")
+            :free
 
-          matchable-m?
-          (= (first m') \|)]
+            :else
+            state)]
       (cond
         ;; Case 0; we're finished. Return the accumulator
         (and (empty? r) (empty? m))
@@ -78,33 +107,25 @@
 
         ;; Case 1; We've consumed r -> keep consuming m
         (empty? r)
-        (recur (conj acc m') :free m+ r)
+        (recur (into acc (rest m')) state m+ r)
 
         ;; Case 2; We've consumed m -> keep consuming r
         (empty? m)
-        (recur (conj acc r') :free m r+)
+        (recur (conj acc r') new-state m r+)
 
-        ;; Case 3; we're adding lines from the markup file until we find a new thing to match on.
-        (not matchable-m?)
-        (recur
-         (conj acc m') state m+ r)
+        ;; Case 3; we're in the introductory matter
+        (-> m' ffirst (not= \|))
+        (recur (into acc m') state m+ r)
 
-        ;; Case 4; we're looking for a match and find one. Since we matched a
-        ;; line of code we must be interrupting a diff.
-        (= r' (string/replace-first m' "|" ""))
-        (recur (into acc [r' "```"]) :in-diff m+ r+)
+        ;; Case 4; we're looking for a match and find one.
+        (-> m' first (string/replace-first "|" "") (= r'))
+        (if (= state :in-diff)
+          (recur (into acc (concat [r' "```"] (rest m') ["```diff"])) new-state m+ r+)
+          (recur (into acc (concat [r'] (rest m'))) new-state m+ r+))
 
-        ;; Case 5; we're waiting for a match but can't possibly find it. Dump
-        (empty? r)
-        (recur (conj acc m') state m+ r)
-
-        ;; Case 6; we're waiting for a match and don't see it yet.
+        ;; Case 5; we're looking for a match and don't see it yet.
         :else
-        (let [new-items
-              (if (= state :in-diff)
-                ["" "```diff" r']
-                [r'])]
-          (recur (into acc new-items) :free m r+))))))
+        (recur (conj acc r') new-state m r+)))))
 
 (def cli-options
   [[nil "--exclude-commits FILE" "File of commit hashes (one per line) to exclude from the markdown generated."]
@@ -141,7 +162,7 @@
 
       (->>
        repo
-       (output journal-data)
+       (journal-repo-merge journal-data)
        (interpose "\n")
        (apply str)
        println))))

After this commit we also have the ability to put comments just after or before a commit. Although there needs to be a newline to escape the quote.

Commit 12

diff --git a/doc/commits.exclude b/doc/commits.exclude
index 771d83d..ea27976 100644
--- a/doc/commits.exclude
+++ b/doc/commits.exclude
@@ -1 +1,3 @@
 5ec8855d6c3455087eb556578a07892f8005ad94
+139c360237f6c3174d1a29ee8bb4655fe87976d8
+382a63899bfe87e23d0a415b18b60e4ca23940a8

I don’t think there is a practical attack that lets me embed a commit’s hash inside itself, so we’ll just live with this. In fact, since I’m commiting the journal to git, there will be one entry that can’t be commented on because of this.

Commit 13

diff --git a/doc/real.journal b/doc/real.journal
new file mode 100644
index 0000000..e36e4b7
--- /dev/null
+++ b/doc/real.journal
@@ -0,0 +1,96 @@
+% Journaling Clojure With Clojure
+
+``` {=html}
+<style>
+body { min-width: 80% !important; }
+</style>
+```
+
+# Journaling Clojure With Clojure
+
+I want to keep a coding journal, showing how I work through a project over time, building up code and watching how changes get identified and made. This is the proof of concept where I develop the technique on the script that generates the journal. We're going to go on a walkthrough of the development of this page...
+
+|+                 #_[org.eclipse.jgit/org.eclipse.jgit "6.7.0.202309050840-r"]
+JGit has an unhelpful interface with lots of iterators or something. It scared me off and I ended up using `clj-jgit` (Clojure wrapper aroung JGit) - mainly because `clj-jgit` gave me a sequence of commits easily.
+
+|+(println
+This is the tail end of experiments in clj-jgit and JGit. It was `(gq/rev-list r)` that convinecd me in the end. I couldn't see a neat way of doing that following JGit tutorials.
+
+|+   string/split-lines))
+
+Minor bug: The order of the commits was backwards in this commit.
+
+|+      "Nothing\nYet\n|   :main ^:skip-aot git-blog-clj.core\nHehe")
+The syntax doesn't change much from this. `|` to mark lines as something to match against lines from git, and otherwise freeform test.
+
+|+      (cond
+I get the feeling this layout is a mistake. Technically it might be acceptable but it is difficult to read the logic out of this conde structure. The inspiration I'm working from is a [merge join](https://en.wikipedia.org/wiki/Sort-merge_join) which worked quite well as a core algorithm. That seemed fine. The problem is in the `cond`'s lack of clarity. In situations like this I usually recommend using state machines and I will remember my own advice in a few commits.
+
+The `r'` and `r+` notation worked well though.
+
+|+   (interpose "\n```\n\n\n# Next Commit\n```diff\n")
+Quickly the formatting gets more complicated. This is weaknesses here quickly became clear - this is trying to work out where to put the backticks by considering the data as a stream of lines - when just 2 lines higher the commits are still organised as commits. I figure out the sensible approach in a few commits and wrap in a better way.
+
+|+        (if (= (ffirst m+) \|)
+The `cond` is rapidly falling apart. This type of if-in-cond is awkward and one of my triggers to look for alternative code layouts. Again, the real question here is where the code blocks should go and trying to answer that while working in a line-by-line framework isn't easy.
+
+|+(defn to-diff-block
+This is the part of writing code that I probaly enjoy the most. Carving out chunks of logic from a less organised mass. On second pass I've moved a level of abstraction up and am dealing with an entire commit (it is wrong to call it a file) to add code block markers.
+
+|+   (remove banned-sha1-hash?)
+If you look up the commit - this was the one that added the license file. I wouldn't even give the GPL that much space on this page, let alone the Eclipse license!
+
+|+         state ; free in-diff
+Not a moment too soon. A 2-state state machine. This tracks if we are interrupting a commit or not. If I expand this project at all there'll probably be more states and - ideally - the entire `cond` logic can be refactored with states. It is easier to come back to code with named states.
+
+The state logic causes the code block insertions to happen in a different part of the code - I thought that was an improvement.
+
+|+                 [org.slf4j/slf4j-nop "2.0.9"]
+Key change. This gets rid of the
+
+```
+SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
+SLF4J: Defaulting to no-operation (NOP) logger implementation
+SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+```
+
+|+   [clojure.tools.cli :refer [parse-opts]]))
+
+Considered a few options from the [Clojure Toolbox](https://www.clojure-toolbox.com/). This one seemed about the right level of difficulty. I was very happy with it, 10/10 would parse-opts again. To run with `-h` the invocation is `lein run -- -h`.
+
+|diff --git a/doc/example.journal b/doc/example.journal
+There isn't a lot for the next few commits. Setting up command line options is a little tedious.
+
+|+  [^Git repo idx ^RevCommit rev-commit]
+It took a few goes to get this function signature, but it seems like the proper approach is to map the transform from git object to text.
+
+|+                 (str "> Commit hash " (.getName rev-commit))
+For ease of excluding commits. We're building up quite a few lines in this object that aren't code. This is going to put more pressure on the cond statement. If the journal tries to match on the "Commit hash" line to add some text after a commit then the cond will be putting code blocks in unintended places.
+
+|-                ["````"
+Oops. Emacs tries to be helpful and inserts 2 \` when one is typed. Usually that helps.
+
+|+   rest ; diff of first commit is ""
+We're making a lot of assumptions here - small, linear commit history. First commit blank. How this goes in practice is uncertain but I think usually my projects have quite linear histories.
+
+|-(defn output
+Amazingly, this next function isn't in the Clojure standard library. The first attempt was with `partition-by` but that puts the lines starting with `|` into their own partition. This would make a good transducer (transducers are great, I've been using them a lot when memory use becomes a factor and they simplify lazy sequence processing a lot) but I don't think this operation comes up that often.
+
+|+         (partition-when #(= (first %) \|) markup-data)
+This is quite ugly. It is bad form for `r'` and `m'` to be radically different objects. But the logic became much neater. I was expecting to use more state machines here but in the end using a different data structure was the major change.
+
+As can be seen in the next few lines, the `cond` itself is not a bit simpler with only 5 cases, 3 of which are trivial edge cases for empty arguments. It adds entire journal entries at once which helps keep the merge logic readable.
+
+|> Commit hash e08e523091c93f068ed223ce5b8ecc7f0f2ea41d
+
+After this commit we also have the ability to put comments just after or before a commit. Although there needs to be a newline to escape the quote.
+
+|> Commit hash 1a2694311d3c2965d7335d4028f479fd1c84b48e
+
+I don't think there is a practical attack that lets me embed a commit's hash inside itself, so we'll just live with this. In fact, since I'm commiting the journal to git, there will be one entry that can't be commented on because of this.
+
+||FIN.
+
+# Conclusions
+
+That was a fun experiment, and I completed it! That is a good sign for a project.

Conclusions

That was a fun experiment, and I completed it! That is a good sign for a project.