Text Alignment: Full Justification #1 – Word Wrap

I’m obsessed with typography. Not great with it, just obsessed. The company gets its name from the legendary Misc Fixed 6×13 Regular.

Typography is people finding ways to communicate knowledge and ideas in style.

Typography is Richard Feynman.

Typography is David Bowie.

I like full justification. It naturally makes paragraph bounds recognizable, letting us use the block in more complex page layouts. While it is easy to implement in its most basic form, it is really, really hard to get right. It usually renders plain bad. Which is probably why not everyone share my enthusiasm for full justification. I haven’t implemented it, like ever, though. I am just ranting here. Let’s change that.

Here is our introductory information on justification:
https://en.wikipedia.org/wiki/Typographic_alignment

We are going to justify text for rendering with monospaced fonts, which nicely narrows the problem domain for the purposes of a this article. Convenient, as fixed spacing means literally one less variable to worry about.

Prior Art

Better start by checking out what we are getting into, since this mostly is a solved problem, there are examples.

Knuth-Plass line-wrapping algorithm is the still the go-to solution for the text alignment problem. However, it is designed for typesetting text rendered with variable-width fonts. Still, it doesn’t hurt checking it out. Terje D. provides a simplified version at SO:

Add start of paragraph to list of active breakpoints
For each possible breakpoint (space) B_n, starting from the beginning:
   For each breakpoint in active list as B_a:
      If B_a is too far away from B_n:
          Delete B_a from active list
      else
          Calculate badness of line from B_a to B_n
          Add B_n to active list
          If using B_a minimizes cumulative badness from start to B_n:
             Record B_a and cumulative badness as best path to B_n

The result is a linked list of breakpoints to use.

The badness of lines under consideration can be calculated like this:

Each  space  is assigned  a  nominal  width,  a strechability,  and  a
shrinkability.   The  badness  is  then calculated  as  the  ratio  of
stretching  or shrinking  used, relative  to what  is allowed,  raised
e.g. to the third power (in  order to ensure that several slightly bad
lines are prefered over one really bad one)

Turns out I was wrong, it actually hurts checking it out. Anyway.

Emacs does fixed-space justification with its “fill” commands, so I skimmed Emacs sources. The package lisp/textmodes/fill.el is big but the relevant part of the implementation is ~100 LOC. Allright.

It, not unlike most Emacs text-editing code, directly modifies the text buffer:

- While cursor is still in paragraph:
  - Word wrap.
    - Go to fill column.
    - Go back and find a place to cut.
    - Make sure we're good.
    - Insert newline.
  - Go back one line and justify.
    - Merge multiple consecutive spaces among the line.
	- Count remaining spaces in line.
	- Calculate number of additional spaces needed.
	- Insert the damn spaces.

The -inattentively trimmed- “insert the damn spaces” code from Emacs sources is below, with comments added for clarity:

(let (ncols          ; number of additional space chars needed
      nspaces        ; number of spaces between words
      curr-fracspace ; current fractional space amount
      count)
  ;; I don't like WordPress one bit.
  (when (and (> ncols 0) (> nspaces 0))
    (setq curr-fracspace (+ ncols (/ nspaces 2))
          count nspaces)
    ;; I don't like WordPress one bit.
    (while (> count 0)
      (skip-chars-forward " ")
      (insert-char ?\s (/ curr-fracspace nspaces) t)
      (search-forward " " nil t)
      (setq count (1- count)
            curr-fracspace
            (+ (% curr-fracspace nspaces) ncols)))))
Word Wrap

We are not tied to a text buffer, so we can freely experiment on whatever data structure we feel like.

We will handle only word-wrapping part in this post, as we will try various versions and also see how they perform.

The preparations and test code.

(ql:quickload :cl-ppcre)

(defparameter *fill-column* 60)

(defparameter *text* "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam semper risus mauris, et dignissim lorem lacinia non. Sed vitae lacus nisi. Fusce vitae lectus non quam dictum luctus et at mauris.")

(defun word-wrap-test (fn)
  (print (funcall fn *text*))
  (gc :full t)
  (time (dotimes (x 10000) (funcall fn *text*))))

First, the most logical implementation. Reading char-by-char. Performs really well, of course, as it doesn’t do any list manipulation.

(defun word-wrap-0 (txt)
  (with-input-from-string (in txt)
    (loop for  chr of-type (or null base-char)  = (read-char in nil)
          with bol of-type integer              = 0 ; beginning of line
          and  cut of-type integer              = 0 ; potential breakpoint
          and  buf of-type (vector base-char)       ; output buffer
                 = (make-array 4 :element-type 'base-char
                                 :adjustable t
                                 :fill-pointer 0)
          while chr
          do (case chr
               (#\Newline
                (setf bol (fill-pointer buf)))
               (#\Space
                (setf cut (fill-pointer buf))))
          unless (vector-push chr buf)
            do (adjust-array buf (* 16 (array-dimension buf 0)))
               (vector-push chr buf)
          when (and (> (fill-pointer buf) (+ bol *fill-column*))
                    (< bol cut)) ; for newlines in source
            do (setf (aref buf cut) #\Newline
                     bol cut)
          finally (return buf))))

0.166 seconds of real time
398,942,396 processor cycles
13,895,328 bytes consed

The result is this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Etiam semper risus mauris, et dignissim lorem lacinia non.
Sed vitae lacus nisi. Fusce vitae lectus non quam dictum
luctus et at mauris.

The other examples will rely on cl-ppcre:split for splitting the text.
That call singlehandedly takes:
0.518 seconds of real time
1,241,109,342 processor cycles
55,040,080 bytes consed

Now we try with nested lists all the way.

(defun word-wrap-1 (txt)
  (loop for  text of-type list    on (ppcre:split "\\s+" txt)
        as   word of-type string  = (car text)
        as   next of-type string  = (car (cdr text))
        with line of-type list    = '()
        and  crsr of-type integer = *fill-column*
        ;; Insert the word in line, moving the cursor.
        do (setf line (nconc line (list word))
                 crsr (- crsr (length word)))
        if (or (null next) (< crsr (1+ (length next))))
          ;; If EOL or EOF, line feed and carriage return.
          collect line into buffer
          and do (setf crsr *fill-column*
                       line nil)
        else do (decf crsr)
        finally (return (format nil
                                "~{~&~{~a~^ ~}~}"
                                buffer))))

0.738 seconds of real time
1,767,125,344 processor cycles
92,799,088 bytes consed

Another approach would be to format lines immediately and concatenate later.

(defun word-wrap-2 (txt)
  (loop for  text of-type list     on (ppcre:split "\\s+" txt)
        as   word of-type string   = (car text)
        as   next of-type string   = (car (cdr text))
        with line of-type string   = ""
        and  fmt  of-type function = (formatter "~a~:[ ~;~]~a")
        and  crsr of-type integer  = *fill-column*
        do (setf line (format nil fmt
                              line (string= line "") word)
                 crsr (- crsr (length word)))
        if (or (null next) (< crsr (1+ (length next))))
          collect line into buffer
          and do (setf crsr *fill-column*
                       line "")
        else do (decf crsr)
        finally (return (format nil "~{~&~A~}" buffer))))

1.087 seconds of real time
2,603,569,620 processor cycles
397,437,888 bytes consed

No line. Adjustable array buffer. Start with zero size. Allocate for every word. I know, it’s stupid.

(defun word-wrap-3 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with crsr   of-type integer = *fill-column*
        and  buffer = (make-array 0 :element-type 'character
                                    :adjustable t
                                    :fill-pointer 0)
        do (adjust-array buffer (+ (array-dimension buffer 0) (length word) 1))
           (loop for c across word do (vector-push c buffer))
           (setf crsr (- crsr (length word)))
        if (< crsr (1+ (length next)))
          do (vector-push #\Newline buffer)
             (setf crsr *fill-column*)
        else
          do (vector-push #\Space buffer)
             (decf crsr)
        finally (vector-pop buffer)
                (return buffer)))

2.019 seconds of real time
4,836,461,608 processor cycles
504,132,464 bytes consed

Let vector-push-extend handle allocation, at least doubling the buffer size.

(defun word-wrap-4 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with crsr   of-type integer = *fill-column*
        and  buffer = (make-array 1 :element-type 'character
                                    :adjustable t
                                    :fill-pointer 0)
        do (loop for c across word do
          (vector-push-extend c buffer
                              (array-dimension buffer 0)))
           (setf crsr (- crsr (length word)))
        if (< crsr (1+ (length next)))
          do (vector-push-extend #\Newline buffer)
             (setf crsr *fill-column*)
        else
          do (vector-push-extend #\Space buffer)
             (decf crsr)
        finally (vector-pop buffer)
                (return buffer)))

0.643 seconds of real time
1,538,669,380 processor cycles
99,810,880 bytes consed

Try vector-pushing, if fails, adjust array manually.

(defun word-wrap-5 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with crsr   of-type integer = *fill-column*
        and  buffer of-type (vector character)
               = (make-array 4 :element-type 'character
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless (vector-push c buffer)
                   do (adjust-array
                       buffer (* 16 (array-dimension buffer 0)))
                      (vector-push c buffer))
           (setf crsr (- crsr (length word)))
        if (< crsr (1+ (length next)))
          do (vector-push-extend #\Newline buffer)
             (setf crsr *fill-column*)
        else
          do (vector-push-extend #\Space buffer)
             (decf crsr)
        finally (vector-pop buffer)
                (return buffer)))

0.632 seconds of real time
1,512,965,832 processor cycles
100,453,840 bytes consed

Use fill-pointer as cursor, adjusting EOL index.

(defun word-wrap-6 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with eol    of-type integer = *fill-column*
        and  buffer of-type (vector character)
               = (make-array 4 :element-type 'character
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless (vector-push c buffer)
                   do (adjust-array
                       buffer (* 16 (array-dimension buffer 0)))
                      (vector-push c buffer))
        if (< eol (+ (fill-pointer buffer) 1 (length next)))
          do (vector-push-extend #\Newline buffer)
             (setf eol (+ (fill-pointer buffer) *fill-column*))
        else
          do (vector-push-extend #\Space buffer)
        finally (vector-pop buffer)
                (return buffer)))

0.632 seconds of real time
1,513,173,132 processor cycles
100,451,712 bytes consed

Get rid of redundant push-pop.

(defun word-wrap-7 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with eol    of-type integer = *fill-column*
        and  buffer of-type (vector character)
               = (make-array 4 :element-type 'character
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless
                 (vector-push c buffer)
                 do (adjust-array buffer
                                  (* 16 (array-dimension buffer 0)))
                    (vector-push c buffer))
           (cond ((null next))
                 ((< eol (+ (fill-pointer buffer) 1 (length next)))
                  (vector-push-extend #\Newline buffer)
                  (setf eol (+ (fill-pointer buffer) *fill-column*)))
                 (t (vector-push-extend #\Space buffer)))
        finally (return buffer)))

0.628 seconds of real time
1,503,526,500 processor cycles
100,451,712 bytes consed

More concise conditionals. Conses way better. Best performer among the versions that ppcre:split.

(defun word-wrap-8 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with eol    of-type integer = *fill-column*
        and  buffer of-type (vector base-char)
               = (make-array 4 :element-type 'base-char
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless (vector-push c buffer)
                   do (adjust-array buffer
                                    (* 16 (array-dimension buffer 0)))
                      (vector-push c buffer))
        when next
          if (< eol (+ (fill-pointer buffer) 1 (length next)))
            do (vector-push-extend #\Newline buffer)
               (setf eol (+ (fill-pointer buffer) *fill-column*))
          else
            do (vector-push-extend #\Space buffer)
        finally (return buffer)))

0.631 seconds of real time
1,510,134,528 processor cycles
67,648,608 bytes consed

In the next post, we will check out which one is the most suitable approach to work with for text alignment across the line.

Published by

Kenan Bölükbaşı

Founder, Project Leader and Developer at 6x13 Games. Game Developer & Designer, CG Generalist, Architect. Theoretical and applied knowledge in programming, design and media. Broad experience in project management. Experience in 3D (mesh, solid & CAD), 2D (raster, vector), and parametric graphics as well as asset pipelines and tools development. Blender 3D specialist (Blender Foundation Certified Trainer).

Leave a Reply

Your email address will not be published. Required fields are marked *