Multi-Resolution Asset Workflow Automation

Now that we know how to do multi-resolution and asset scaling, I will show you how to aid your software stack in exporting multi-resolution assets.

The raster part is easy. You work with high resolution and scale down.

I simply use ImageMagick for this because I am not at all happy with the quality of the downscaling implementations provided by other tools. Of course, maybe they got better in time, but I do not find that worthy of occasional testing.

This task usually requires too much manual involvement, so I will not provide a batch method either.

convert -resize 50% /large/asset.png /medium/asset.png
convert -resize 25% /large/asset.png /small/asset.png

Easy.

For the vector assets, we can go for the comfort of a custom Inkscape extension.

Our extension will be placed in Extensions > Export > Sprite, and the UI will look like this:

Directory is the root visual asset directory in your project. In Twiniwt, that is ~/dev/twiniwt/Resources/res.

If you activate context, you can select if the element is a UI or Game element.

When disabled, the extension will export below assets:

~/dev/twiniwt/Resources/res/large/sprite.png
~/dev/twiniwt/Resources/res/medium/sprite.png
~/dev/twiniwt/Resources/res/small/sprite.png

When Context is enabled, and UI Element selected, the extension will export below assets:

~/dev/twiniwt/Resources/res/large/ui/sprite.png
~/dev/twiniwt/Resources/res/medium/ui/sprite.png
~/dev/twiniwt/Resources/res/small/ui/sprite.png

That’s all. Almost all our production assets in Twiniwt are exported using this simple tool.

You can download the extension as a zip file. I am placing it in Public Domain. Place the contents in the extensions subdirectory of your inkscape home directory. In GNU/Linux, this is ~/.config/inkscape/extensions/. It is probably the same in Mac OSX. Unfortunately, I haven’t even tested this extension in Windows, but I am sure it will work out-of-the-box once you locate it.

However, learning how to modify it, and writing your own production tools is more important.

Implementation

An Inkscape extension with a GUI requires two files. An XML formatted “.inx” file, and the actual implementation module.

Let’s first write the interface and specify the meta data.
Those belong to our sprite.inx file.

<?xml version="1.0" encoding="UTF-8"?>
<inkscape-extension xmlns="http://www.inkscape.org/namespace/inkscape/extension">
 <_name>Sprite</_name>
 <id>org.inkscape.sprite</id>
 <dependency type="extension">org.inkscape.output.svg.inkscape</dependency>
 <dependency type="executable" location="extensions">sprite.py</dependency>
 <dependency type="executable" location="extensions">inkex.py</dependency>
 <param name="directory" type="string" _gui-text="Directory to save images to:">~/</param>
 <param name="image" type="string" _gui-text="Image name (without extension):">sprite</param>
 <param name="has_context" type="boolean" _gui-text="Context:">false</param>
 <param name="context" type="optiongroup" _gui-text="Select context:" appearance="minimal">
 <_option value="ui">UI Element</_option>
 <_option value="game">Game Element</_option>
 </param>
 <effect needs-live-preview="false">
 <object-type>all</object-type>
 <effects-menu>
 <submenu _name="Export"/>
 </effects-menu>
 </effect>
 <script>
 <command reldir="extensions" interpreter="python">sprite.py</command>
 </script>
</inkscape-extension>

Now, the actual implementation, named sprite.py . The boilerplate:

#!/usr/bin/env python

import os
import sys
sys.path.append('/usr/share/inkscape/extensions')
try:
    from subprocess import Popen, PIPE
    bsubprocess = True
except:
    bsubprocess = False
import inkex

The class and the option parsers.

class Sprite(inkex.Effect):
    def __init__(self):
        inkex.Effect.__init__(self)
        self.OptionParser.add_option("--directory", action="store",
                                        type="string", dest="directory",
                                        default=None, help="")

        self.OptionParser.add_option("--image", action="store",
                                        type="string", dest="image",
                                        default=None, help="")

        self.OptionParser.add_option("--has_context", action="store",
                                        type="string", dest="has_context",
                                        default=None, help="")

        self.OptionParser.add_option("--context", action="store",
                                        type="string", dest="context",
                                        default=None, help="")

The utility methods.

    def get_filename_parts(self):
        if self.options.image == "" or self.options.image is None:
            inkex.errormsg("Please enter an image name")
            sys.exit(0)
        return (self.options.directory, self.options.image)

    def check_dir_exists(self, dir):
        if not os.path.isdir(dir):
            os.makedirs(dir)

Exporter for one asset:

    def export_sprite(self, filename, dpi):
        svg_file = self.args[-1]
        command = "inkscape -e \"%s\" -d \"%s\" \"%s\" " % (filename, dpi, svg_file)
        if bsubprocess:
            p = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)
            return_code = p.wait()
            f = p.stdout
            err = p.stderr
        else:
            _, f, err = os.open3(command)
        f.close()

Exporter for all requested resolutions:

    def export_sprites(self, assets):
        dirname, filename = self.get_filename_parts()
        output_files = list()
        if dirname == '' or dirname == None:
            dirname = './'
        dirname = os.path.expanduser(dirname)
        dirname = os.path.expandvars(dirname)
        dirname = os.path.abspath(dirname)
        if dirname[-1] != os.path.sep:
            dirname += os.path.sep
        for directory, scale in assets.items():
            dpi = 96 * scale
            asset_dirname = dirname + directory + os.path.sep
            if self.options.has_context == 'true':
                asset_dirname = asset_dirname + self.options.context + os.path.sep
            self.check_dir_exists(asset_dirname)
            f = asset_dirname + filename + ".png"
            output_files.append(f)
            self.export_sprite(f, dpi)
        inkex.errormsg("The sprites have been saved as:" + "\n\n" + "\n".join(output_files))

This is where we define the resolutions we ask for. Change assets for your own needs if you like.

    def effect(self):
        assets = {"small": 1, "medium": 2, "large": 4}
        self.export_sprites(assets)

The entry point to the extension.

if __name__ == "__main__":
    e = Sprite()
    e.affect()

If anything in this post confuses you, please check out the how to do multi-resolution and asset scaling post. Also, please reply in comments below if you have any questions, ideas or fixes to the extension.

Good luck!

How to Set Asset Scaling and Resolution for 2D Games

When I shared the post about Optimized Parallax Backgrounds, I got asked how our asset resolution and scaling system works.

Dealing with asset scaling in 2D games can be confusing, to say the least. There are many ways to handle it. Which one to chose depends on what you expect. The problem is, you usually do not know what to expect from a responsive interface.

So beware, most scaling strategies don’t respond well to display ratio changes. They will pretend to work on your test device, then they will fail you so bad you will wish you had bookmarked this blog post earlier, which is now. Seriously.

I will now explain one asset scaling strategy that worked great for us in all 4 games we released so far. It is not my invention. I found the original method here, and also influenced by this Cocos2d-x forum post, that uses a different approach to achieve similar behaviour.

Note: In order to make things easier to grasp, I will pretend your game always works full screen regardless of device type. For windowed-mode on a PC, interpret the term “display resolution” as “window resolution”.

Now there are four things you expect from your engine to handle for you.

  • Choosing the most suitable set of graphics according to display resolution.
  • Globally scaling the selected graphics to fit the screen.
  • Letting you ignore all these and totally forget about the display while coding the game itself.
  • Earning you money, success, and preferably a slice of New York Cheesecake.

Sadly, at 6×13 Games, we couldn’t come up with a reliable way for the engine to handle the last one, either. So I will only explain the asset scaling part.

We will use Cocos2d-x engine for the examples and rely on its terminology. But Cocos only provides a basic set of transform policies, so the same idea applies to any 2D engine. The engine source is available, so you can implement the missing stuff in your own engine as well.

Resolution Policy

This actually has nothing to do with resolution. It is about framing.

Cocos2d-x will autoscale the whole scene to somehow fit the frame. Resolution policy is where you tell the engine what your understanding of “fit” is, what kind of behaviour you expect from it. The options are:

  • Exact Fit
  • No Border
  • Show All
  • Fixed Height
  • Fixed Width

I know, you need No Border!

Maybe you do. But let me help you, you really don’t. What you really want is a little more complicated than that.

You do not want a weird display aspect ratio messing up with your precious interaction area, making your UI buttons too small to press. That is what you get with No Border. And that is why we are rolling a better solution.

So, keep reading.

Safe Area

You need a safe area! An area that is not only guaranteed to be shown to user, but also guaranteed to cover as much screen space as possible. So I present you, the safe area:

Anything but the yellow area is just decoration that prevents the user from seeing black border. You never, ever put something that user really needs to see outside the Safe Area.

This is the response we expect from our framework.

The safe area is the center 480×320 units portion of our 570×360 units game area -not pixels, units. It has an aspect ratio of 1,5f .

How do we guarantee that?

We calculate the reference axis first. Then we either choose the Fixed Height, or the Fixed Width policy, according to the reference axis. If the Reference Axis is y-axis, we choose Fixed Height, otherwise Fixed Width.

Whatever our Reference Axis is, it better be Fixed.

I believe the above animation clearly shows what a Reference Axis is. Below is what it mathematically means:

float aspect_ratio = display_res.width / display_res.height;

if ( 1.5f > aspect_ratio )
{
    // Reference Axis = X_AXIS
    // ...
}
else
{
    // Reference Axis = Y_AXIS
    // ...
}

You can download our Safe Area reference SVG as well as three sample sizes exported, as a single zip file.

Now that we have decided how the engine should behave, it needs to know what portion of the screen to use for that behaviour. In Cocos2d-x terms, this is the Design Size.

Design size

Scenes are sizeless. They are just cartesian space with an origin, and possibly some stupid protagonist running around, trying to rescue the damsel who can actually take care of herself. Go away, creepy protagonist!

Anyway. In order for Cocos to fit the scene contents into the frame, it needs to know which portion of the scene we actually consider to be “the scene”. Hence, the Design Size.

In our case, Design Size is the dimensions of our Safe Area.

The moment we set the Design Size and Resolution Policy, the engine will start acting like an adult. Below is how it reacts to both ratio and physical size changes.

Cool animation, right? I made it in Blender, yay!

To sum up:

  • Cocos will scale to fit.
  • Resolution Policy tells it “how” to fit.
  • Design size tells it “what” to fit.

Great, it works! That’s it!

Or not.

Multi-Resolution Support

This would be the happy ending of our asset scaling adventure, if the only crappy, non-standardized piece of hardware on our way was the display. Far from it.

We also need our runtime assets to be fast to process, fit the video memory nicely, and be effected from the least amount of aliasing possible during scaling.

All of them require one thing: having multiple versions of our assets and picking the set of assets to load according to the display resolution, at runtime.

More precisely, we want the density -the definition- of our assets to be close to the expectations of the particular hardware. Because if your mobile device has a Standard Definition display, chances are it also has the video processing power and memory that can only handle Standard Definition, or less. Mobile manufacturers rarely skip the leg day.

Also, there are many algorithms for image scaling, with varying quality and performance characteristics. You want your realtime scaling to be of the fastest kind, which also means less quality. Therefore, it’s best to have your images prescaled to -or reproduced at- closest size.

So, we need to support multiple resolutions as part of our asset scaling strategy.

In order to do that, we put each set of assets in a different resource directory.

We use three size variants, and a directory for each: “small”, “medium”, and “large”.

We will pick the best possible size according to the display dimension of our Reference Axis, in pixels. After that, there is no special procedure to “pick” the directory, you simply add that particular directory to the Resource Search Path of your engine, omitting the others so they will not even be visible to the engine.

Content Scale Factor

We also associate a scale factor with each of those directories, so that the engine knows how the assets map to the design size. It is simple. For example, if the assets in “medium” are twice the definition of your Design Size, the scale is 2. I do that in a resource JSON file that holds other information as well. The relevant parts of the file looks like this:

{
    "version": [
        1,
        0,
        0
    ],
    "graphics": {
        "design": {
            "size": {
                "width": 480,
                "height": 320
            },
            "full": {
                "width": 570,
                "height": 360
            }
        },
        "assets": [
            {
                "scale": 1,
                "directory": "small"
            },
            {
                "scale": 2,
                "directory": "medium"
            },
            {
                "scale": 4,
                "directory": "large"
            }
        ]
    }
}

You are free to hardcode those.

There is a x13_gfx_s structure instance in our game data that holds graphics properties:

typedef struct
{
    // Scale
    int scale;

    // Safe size and full size
    vec2_s safe, full;

    // Directory
    std::string dir;

} x13_gfx_s;

We have an init function that fills it.

void
 x13_init_gfx( float width, float height )
{
    float aspect_ratio    = width / height;
    rapidjson::Value &gfx = x13_data.res.dom[ "graphics" ];

    x13_data.gfx.safe.x = gfx[ "design" ][ "size" ][ "width" ].GetUint( );
    x13_data.gfx.safe.y = gfx[ "design" ][ "size" ][ "height" ].GetUint( );
    x13_data.gfx.full.x = gfx[ "design" ][ "full" ][ "width" ].GetUint( );
    x13_data.gfx.full.y = gfx[ "design" ][ "full" ][ "height" ].GetUint( );
    int unsigned resource_index;

    if ( 1.5f > aspect_ratio )
    {
        if ( width
             > gfx[ "assets" ][ 1u ][ "scale" ].GetUint( ) * x13_data.gfx.safe.x )
        {
            resource_index = 2u;
        }
        else if ( width > gfx[ "assets" ][ 0u ][ "scale" ].GetUint( )
                           * x13_data.gfx.safe.x )
        {
            resource_index = 1u;
        }
        else
        {
            resource_index = 0u;
        }
    }
    else
    {
        if ( height
             > gfx[ "assets" ][ 1u ][ "scale" ].GetUint( ) * x13_data.gfx.safe.y )
        {
            resource_index = 2u;
        }
        else if ( height > gfx[ "assets" ][ 0u ][ "scale" ].GetUint( )
                            * x13_data.gfx.safe.y )
        {
            resource_index = 1u;
        }
        else
        {
            resource_index = 0u;
        }
    }

    x13_data.gfx.dir.assign(
     gfx[ "assets" ][ resource_index ][ "directory" ].GetString( ) );
    x13_data.gfx.scale = gfx[ "assets" ][ resource_index ][ "scale" ].GetInt( );
}

Don’t worry about the SAX stuff. Concentrate on the if statement. We call it in the Cocos2d-x specific AppDelegate::applicationDidFinishLaunching method, and then we:

  • Set design size
  • Set resolution policy
  • Set content scale factor
  • Add selected asset directory to Search Path
bool
 AppDelegate::applicationDidFinishLaunching( )
{
    // ...

    // Set OpenGLView

    // ...

    auto frame_size = glview->getFrameSize( );
    x13_init_gfx( frame_size.width, frame_size.height );

    // Set design resolution and determine fix policy.
    glview->setDesignResolutionSize(
     x13_data.gfx.safe.x,
     x13_data.gfx.safe.y,
     ( 1.5f > ( frame_size.width / frame_size.height ) )
      ? ResolutionPolicy::FIXED_WIDTH
      : ResolutionPolicy::FIXED_HEIGHT );
    // Set content scaling factor.
    director->setContentScaleFactor( x13_data.gfx.scale );
	
    std::string res_path = "res";
    file_utils->addSearchPath( res_path );   // For scene description files.
    file_utils->addSearchPath( res_path + "/audio" );
    file_utils->addSearchPath( res_path + "/fonts" );
    file_utils->addSearchPath( res_path + "/" + x13_data.gfx.dir + "/ui" );
    file_utils->addSearchPath( res_path + "/" + x13_data.gfx.dir + "/game" );
    file_utils->addSearchPath( res_path + "/generic" );

    // ...
    
    return true;
}

As of this point, we can totally forget about the asset selection, resolution and PPI variations, retina displays, weird aspect ratios etc.

We just pretend the screen size is 480×320 when coding, except the few times where you might want to put some fancy, decorative animations outside the Safe Area.

Below is how the engine handles everything.

If you need more information about the above subjects, the following outdated docs from Cocos2d-x wiki are still relevant.

Detailed explanation of Cocos2d-x Multi-resolution adaptation

Multi resolution support

 

Ok, you are all set! Now you need to fill your scenes with beautiful assets.

~Pixel-Perfect Assets

Applying Safe Area strategy to your asset production is pretty straightforward. Just grab the reference assets I shared above, and comply with the boundaries.

But if you really want to preserve every bit of quality you can in a generic way, I have two more recommendations for you.

First, not everything has to be pixel perfect. Rather, try to keep your content scaling uniform among all the assets, because that will, in turn, keep the distortion and aliasing characteristics uniform. No one ever died from a little less definition. We lived just fine watching Video CDs for years, after all.

Remember?

However, watching a VCD side-by-side with a 4K movie now? That would have lasting effects. The point is, don’t let the player’s eyes compare asset densities. Keep it uniform.

Second, raster and vector assets require opposite treatment. You use raster graphics for more organic asset types, which results better if you work with high resolution sources and scale down. Work with exactly four times the size of your biggest production asset version, and scale by %50, %25, %12,5. So if you used our asset scheme, your background textures would have the following attributes:

FINAL ASSETS:
small/bg.png  PNG image data, 570 x 360, 8-bit/color RGBA
medium/bg.png PNG image data, 1140 x 720, 8-bit/color RGBA
large/bg.png  PNG image data, 2280 x 1440, 8-bit/color RGBA

SOURCE:
source/bg.png PNG image data, 4560 x 2880, 8-bit/color RGBA

Easy! Everyone already knows that.

Now, with vector graphics, you want almost the exact opposite. You want your source asset dimensions to be 570x360px, precisely as your smallest production asset variant. And you want to export for each resolution one by one -not export once and scale up. Because if a line is at the pixel border in your smallest resolution, it will always be at the pixel borders at every resolution, as long as you keep doubling the resolution. This guarantees the pixel-perfect output.

Of course, you can use the same method for raster graphics as well, but with raster graphics, the priority is seldom the quality at the pixel level.

Lastly, exporting for multiple resolutions is boring labor. If you  want to automate your multi-resolution asset workflow with custom GUI tools, please check out my new post: Multi-Resolution Asset Workflow Automation.

Allright! That was long. It took me two full days to prepare this post. So please, do not hesitate to share and comment. Especially, if you tried the above method in your games and had problems, or success, drop me a line. Good luck!

Optimized Parallax Backgrounds

While I was designing the visuals for Twiniwt, I wanted various parallax animations for the background, but without blowing up the game size.

We value keeping the game size as small as possible, because not all parts of the world share the same network bandwidth privilages, yet everyone deserves the privilage of having a little fun. Also, there is something inherently uncomfortable about the idea of a 100MB puzzle game. But they are not only games, are they? It is interesting at what lengths the freemium model has to go to become profitable. Anyway.

Here we go.

We need a simple background first.

It is very heavily blurred, which also helps with quantizing and dithering the image, and storing as a colormap.

Now we need maps to use as parallax layers. They all have to be seamless in x-axis. Three layers for silhouette, one for modifying the silhouette with fog.

We compose all this information to a single image. The image looks like this when channels are composed.

I used DXT-5 compressed DDS files. If you use PNG as your final asset format, or export to PNG at some stage, be aware that your graphics suite might try to ignore color information of fully transparent pixels, which effectively destroys the asset.

We will also need a vignette map to divide the final color with, in order to nicely frame the composition.

All layers in place, it looks like this:

Here is the GLSL fragment shader I wrote for cocos2d-x. It automatically prefixes the shader code with some convenience definitions, but the idea is there, if you want to use it in another engine.

// Copyright (C) 2017 Kenan Bölükbaşı - 6x13 Games

#ifdef GL_ES
precision lowp float;
#endif

varying vec2 v_texCoord;
uniform mat4 u_parallax;

vec4
 lookup( vec4 pd )
{
    return texture2D( CC_Texture1,
                      vec2( v_texCoord.s + mod( pd.w * CC_Time[ 0 ], 1. ),
                            v_texCoord.t * 2. - 1. ) );
}

void
 main( void )
{
    vec3 bg = texture2D( CC_Texture0, v_texCoord ).rgb;

    if ( v_texCoord.t > .5 )
    {
        vec4 fog_pd = u_parallax[ 3 ];
        float fog_f = lookup( fog_pd ).w * .05;

        for ( int i = 0; i < 3; i++ )
        {
            vec4 mnt_pd = u_parallax[ i ];

            bg = mix(
             bg,
             mix( mnt_pd.rgb, fog_pd.rgb, fog_f * inversesqrt( mnt_pd.w ) ),
             lookup( mnt_pd )[ i ] );
        }
    }

    gl_FragColor.rgb = bg / texture2D( CC_Texture2, v_texCoord ).r;
    gl_FragColor.a   = 1.;
}

I am pretty sure this shader can be much better. Please, do not hesitate to share modifications, and I will edit the post.

We also store all color palette and movement speed information separately and load them as a uniform mat4.

{
    .76f, .67f, .49f, .01f,
    .80f, .57f, .27f, .07f,
    .78f, .43f, .25f, .25f,
    .80f, .80f, .50f, .17f
}

Because we want the palette to be specific to the mood of each background, and also it is way easier to experiment with colors this way.

In short:

PARALLAX.DDS: 342K
Microsoft DirectDraw Surface (DDS), 1024 x 256, DXT5

BG.PNG      : 91K
PNG image data, 1621 x 1024, 4-bit colormap, non-interlaced

VIGNETTE.PNG: 216K (shared among all backgrounds)
PNG image data, 811 x 512, 8-bit grayscale, non-interlaced

All high definition assets costs us ~400K per background. This way, we were able to fit 3 completely different background styles in less than 1.5MB in Twiniwt.

Rules of Thumb

Finally, achieving the best packing for games requires a holistic approach to development. It reflects on decisions made by artists as well as developers.

Some rules of thumb for anyone who wants to do production assets:

Know your file formats.

PNG, for example, is NOT “the format that stores alpha channel and compress loselessly.” The details matter. PNG specification defines multiple ways to store both color and transparency information. (Fortunately, PNG is also not your best option for final assets.)

In his seminal CppCon 2014 talk, Mike Acton describes the developer’s job as: “to solve data transformation problems.” As such, people creating production assets should be aware of what they are really feeding into that transformation. This is not the job of a technical artist, this is the responsibility of a digital artist.

Know your tools.

Not everything in file format specifications are well/strictly defined. And implementations are far from being perfect. Different tools may vary in the way they interpret files. So know how your tools handle the import/export of your assets.

bake.

This doesn’t seem like it needs reminding. But nowadays, we tend to embrace “best practices” that favor flexibility, which sometimes can carry unnecessary calculations into runtime. If the distance from the camera is always the same, maybe the amount of blur is the same. It doesn’t matter if you have that amazing focus blur shader, you can just bake the blur.

FOLLOW-UP

My good friend, Marcel Smit, reviewed the post and made some great comments regarding compression and PNG format problems. I believe they should be part of the post. Here we go:

I was thinking for the parallax scrolling you could compress it even further by using only black and white and storing the images with one bit per pixel and RLE-compression. You could blur the images after loading them.

RLE, short for Run-length encoding is a very simple method that has been around for a long time. No need for a technical description. This is Run-length encoding:

I found a very nice and super fast blurring algorithm..
https://github.com/memononen/fontstash/blob/master/src/fontstash.h#L987
It’s a clever trick to quickly blur an image. You’d need to do both a horizontal and vertical pass in constant time to do a 2D gaussian blur.

I am yet to try this method and see how it fares, but the idea makes so much sense I can’t see any reason it doesn’t work better.

There are other reasons to avoid PNG. Like you said it leaves out colors for translucent pixels. This is BAD when you’re doing bilinear filtering, as with bilinear filtering the GPU also samples adjacent pixels, but these may be white/black/whatever color your export tool used to replace the colors it optimized away with!

I can’t stress enough the importance of tools/libraries like ImageMagick, GraphicsMagick and GEGL, when you need to find what is really going on with your production assets. You can batch revise invisible properties of your assets in a matter of seconds. For example you can backup and batch remove all alpha channels to reveal how BAD it really is.

I had to write fix-up code for this for Riposte. I used the average adjacent pixel color for non-translucent neighboring pixels. Another reason to avoid it, is because it is horribly slow to decompress PNGs. Reading raw data or TGAs or decompressing your own format is likely much faster.

Thanks again, Marcel!

Text Alignment: Full Justification #1 – Word Wrap

I’m obsessed with typography. Not great with it, just obsessed. The company gets its name from the legendary Misc Fixed 6×13 Regular.

Typography is people finding ways to communicate knowledge and ideas in style.

Typography is Richard Feynman.

Typography is David Bowie.

I like full justification. It naturally makes paragraph bounds recognizable, letting us use the block in more complex page layouts. While it is easy to implement in its most basic form, it is really, really hard to get right. It usually renders plain bad. Which is probably why not everyone share my enthusiasm for full justification. I haven’t implemented it, like ever, though. I am just ranting here. Let’s change that.

Here is our introductory information on justification:
https://en.wikipedia.org/wiki/Typographic_alignment

We are going to justify text for rendering with monospaced fonts, which nicely narrows the problem domain for the purposes of a this article. Convenient, as fixed spacing means literally one less variable to worry about.

Prior Art

Better start by checking out what we are getting into, since this mostly is a solved problem, there are examples.

Knuth-Plass line-wrapping algorithm is the still the go-to solution for the text alignment problem. However, it is designed for typesetting text rendered with variable-width fonts. Still, it doesn’t hurt checking it out. Terje D. provides a simplified version at SO:

Add start of paragraph to list of active breakpoints
For each possible breakpoint (space) B_n, starting from the beginning:
   For each breakpoint in active list as B_a:
      If B_a is too far away from B_n:
          Delete B_a from active list
      else
          Calculate badness of line from B_a to B_n
          Add B_n to active list
          If using B_a minimizes cumulative badness from start to B_n:
             Record B_a and cumulative badness as best path to B_n

The result is a linked list of breakpoints to use.

The badness of lines under consideration can be calculated like this:

Each  space  is assigned  a  nominal  width,  a strechability,  and  a
shrinkability.   The  badness  is  then calculated  as  the  ratio  of
stretching  or shrinking  used, relative  to what  is allowed,  raised
e.g. to the third power (in  order to ensure that several slightly bad
lines are prefered over one really bad one)

Turns out I was wrong, it actually hurts checking it out. Anyway.

Emacs does fixed-space justification with its “fill” commands, so I skimmed Emacs sources. The package lisp/textmodes/fill.el is big but the relevant part of the implementation is ~100 LOC. Allright.

It, not unlike most Emacs text-editing code, directly modifies the text buffer:

- While cursor is still in paragraph:
  - Word wrap.
    - Go to fill column.
    - Go back and find a place to cut.
    - Make sure we're good.
    - Insert newline.
  - Go back one line and justify.
    - Merge multiple consecutive spaces among the line.
	- Count remaining spaces in line.
	- Calculate number of additional spaces needed.
	- Insert the damn spaces.

The -inattentively trimmed- “insert the damn spaces” code from Emacs sources is below, with comments added for clarity:

(let (ncols          ; number of additional space chars needed
      nspaces        ; number of spaces between words
      curr-fracspace ; current fractional space amount
      count)
  ;; I don't like WordPress one bit.
  (when (and (> ncols 0) (> nspaces 0))
    (setq curr-fracspace (+ ncols (/ nspaces 2))
          count nspaces)
    ;; I don't like WordPress one bit.
    (while (> count 0)
      (skip-chars-forward " ")
      (insert-char ?\s (/ curr-fracspace nspaces) t)
      (search-forward " " nil t)
      (setq count (1- count)
            curr-fracspace
            (+ (% curr-fracspace nspaces) ncols)))))
Word Wrap

We are not tied to a text buffer, so we can freely experiment on whatever data structure we feel like.

We will handle only word-wrapping part in this post, as we will try various versions and also see how they perform.

The preparations and test code.

(ql:quickload :cl-ppcre)

(defparameter *fill-column* 60)

(defparameter *text* "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam semper risus mauris, et dignissim lorem lacinia non. Sed vitae lacus nisi. Fusce vitae lectus non quam dictum luctus et at mauris.")

(defun word-wrap-test (fn)
  (print (funcall fn *text*))
  (gc :full t)
  (time (dotimes (x 10000) (funcall fn *text*))))

First, the most logical implementation. Reading char-by-char. Performs really well, of course, as it doesn’t do any list manipulation.

(defun word-wrap-0 (txt)
  (with-input-from-string (in txt)
    (loop for  chr of-type (or null base-char)  = (read-char in nil)
          with bol of-type integer              = 0 ; beginning of line
          and  cut of-type integer              = 0 ; potential breakpoint
          and  buf of-type (vector base-char)       ; output buffer
                 = (make-array 4 :element-type 'base-char
                                 :adjustable t
                                 :fill-pointer 0)
          while chr
          do (case chr
               (#\Newline
                (setf bol (fill-pointer buf)))
               (#\Space
                (setf cut (fill-pointer buf))))
          unless (vector-push chr buf)
            do (adjust-array buf (* 16 (array-dimension buf 0)))
               (vector-push chr buf)
          when (and (> (fill-pointer buf) (+ bol *fill-column*))
                    (< bol cut)) ; for newlines in source
            do (setf (aref buf cut) #\Newline
                     bol cut)
          finally (return buf))))

0.166 seconds of real time
398,942,396 processor cycles
13,895,328 bytes consed

The result is this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Etiam semper risus mauris, et dignissim lorem lacinia non.
Sed vitae lacus nisi. Fusce vitae lectus non quam dictum
luctus et at mauris.

The other examples will rely on cl-ppcre:split for splitting the text.
That call singlehandedly takes:
0.518 seconds of real time
1,241,109,342 processor cycles
55,040,080 bytes consed

Now we try with nested lists all the way.

(defun word-wrap-1 (txt)
  (loop for  text of-type list    on (ppcre:split "\\s+" txt)
        as   word of-type string  = (car text)
        as   next of-type string  = (car (cdr text))
        with line of-type list    = '()
        and  crsr of-type integer = *fill-column*
        ;; Insert the word in line, moving the cursor.
        do (setf line (nconc line (list word))
                 crsr (- crsr (length word)))
        if (or (null next) (< crsr (1+ (length next))))
          ;; If EOL or EOF, line feed and carriage return.
          collect line into buffer
          and do (setf crsr *fill-column*
                       line nil)
        else do (decf crsr)
        finally (return (format nil
                                "~{~&~{~a~^ ~}~}"
                                buffer))))

0.738 seconds of real time
1,767,125,344 processor cycles
92,799,088 bytes consed

Another approach would be to format lines immediately and concatenate later.

(defun word-wrap-2 (txt)
  (loop for  text of-type list     on (ppcre:split "\\s+" txt)
        as   word of-type string   = (car text)
        as   next of-type string   = (car (cdr text))
        with line of-type string   = ""
        and  fmt  of-type function = (formatter "~a~:[ ~;~]~a")
        and  crsr of-type integer  = *fill-column*
        do (setf line (format nil fmt
                              line (string= line "") word)
                 crsr (- crsr (length word)))
        if (or (null next) (< crsr (1+ (length next))))
          collect line into buffer
          and do (setf crsr *fill-column*
                       line "")
        else do (decf crsr)
        finally (return (format nil "~{~&~A~}" buffer))))

1.087 seconds of real time
2,603,569,620 processor cycles
397,437,888 bytes consed

No line. Adjustable array buffer. Start with zero size. Allocate for every word. I know, it’s stupid.

(defun word-wrap-3 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with crsr   of-type integer = *fill-column*
        and  buffer = (make-array 0 :element-type 'character
                                    :adjustable t
                                    :fill-pointer 0)
        do (adjust-array buffer (+ (array-dimension buffer 0) (length word) 1))
           (loop for c across word do (vector-push c buffer))
           (setf crsr (- crsr (length word)))
        if (< crsr (1+ (length next)))
          do (vector-push #\Newline buffer)
             (setf crsr *fill-column*)
        else
          do (vector-push #\Space buffer)
             (decf crsr)
        finally (vector-pop buffer)
                (return buffer)))

2.019 seconds of real time
4,836,461,608 processor cycles
504,132,464 bytes consed

Let vector-push-extend handle allocation, at least doubling the buffer size.

(defun word-wrap-4 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with crsr   of-type integer = *fill-column*
        and  buffer = (make-array 1 :element-type 'character
                                    :adjustable t
                                    :fill-pointer 0)
        do (loop for c across word do
          (vector-push-extend c buffer
                              (array-dimension buffer 0)))
           (setf crsr (- crsr (length word)))
        if (< crsr (1+ (length next)))
          do (vector-push-extend #\Newline buffer)
             (setf crsr *fill-column*)
        else
          do (vector-push-extend #\Space buffer)
             (decf crsr)
        finally (vector-pop buffer)
                (return buffer)))

0.643 seconds of real time
1,538,669,380 processor cycles
99,810,880 bytes consed

Try vector-pushing, if fails, adjust array manually.

(defun word-wrap-5 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with crsr   of-type integer = *fill-column*
        and  buffer of-type (vector character)
               = (make-array 4 :element-type 'character
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless (vector-push c buffer)
                   do (adjust-array
                       buffer (* 16 (array-dimension buffer 0)))
                      (vector-push c buffer))
           (setf crsr (- crsr (length word)))
        if (< crsr (1+ (length next)))
          do (vector-push-extend #\Newline buffer)
             (setf crsr *fill-column*)
        else
          do (vector-push-extend #\Space buffer)
             (decf crsr)
        finally (vector-pop buffer)
                (return buffer)))

0.632 seconds of real time
1,512,965,832 processor cycles
100,453,840 bytes consed

Use fill-pointer as cursor, adjusting EOL index.

(defun word-wrap-6 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with eol    of-type integer = *fill-column*
        and  buffer of-type (vector character)
               = (make-array 4 :element-type 'character
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless (vector-push c buffer)
                   do (adjust-array
                       buffer (* 16 (array-dimension buffer 0)))
                      (vector-push c buffer))
        if (< eol (+ (fill-pointer buffer) 1 (length next)))
          do (vector-push-extend #\Newline buffer)
             (setf eol (+ (fill-pointer buffer) *fill-column*))
        else
          do (vector-push-extend #\Space buffer)
        finally (vector-pop buffer)
                (return buffer)))

0.632 seconds of real time
1,513,173,132 processor cycles
100,451,712 bytes consed

Get rid of redundant push-pop.

(defun word-wrap-7 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with eol    of-type integer = *fill-column*
        and  buffer of-type (vector character)
               = (make-array 4 :element-type 'character
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless
                 (vector-push c buffer)
                 do (adjust-array buffer
                                  (* 16 (array-dimension buffer 0)))
                    (vector-push c buffer))
           (cond ((null next))
                 ((< eol (+ (fill-pointer buffer) 1 (length next)))
                  (vector-push-extend #\Newline buffer)
                  (setf eol (+ (fill-pointer buffer) *fill-column*)))
                 (t (vector-push-extend #\Space buffer)))
        finally (return buffer)))

0.628 seconds of real time
1,503,526,500 processor cycles
100,451,712 bytes consed

More concise conditionals. Conses way better. Best performer among the versions that ppcre:split.

(defun word-wrap-8 (txt)
  (loop for  text   of-type list    on (ppcre:split "\\s+" txt)
        as   word   of-type string  = (car text)
        as   next   of-type string  = (car (cdr text))
        with eol    of-type integer = *fill-column*
        and  buffer of-type (vector base-char)
               = (make-array 4 :element-type 'base-char
                               :adjustable t
                               :fill-pointer 0)
        do (loop for c across word
                 unless (vector-push c buffer)
                   do (adjust-array buffer
                                    (* 16 (array-dimension buffer 0)))
                      (vector-push c buffer))
        when next
          if (< eol (+ (fill-pointer buffer) 1 (length next)))
            do (vector-push-extend #\Newline buffer)
               (setf eol (+ (fill-pointer buffer) *fill-column*))
          else
            do (vector-push-extend #\Space buffer)
        finally (return buffer)))

0.631 seconds of real time
1,510,134,528 processor cycles
67,648,608 bytes consed

In the next post, we will check out which one is the most suitable approach to work with for text alignment across the line.