Flyweight

Quick Information/Overview

Pattern Type Structural
Applicable Language/Framework Agnostic OOP
Pattern Source Gang of Four
Difficulty Easy

Image courtesy of dofactory

Up Front Definitions

  1. Client: Code that uses the flyweight implementation
  2. Concrete Flyweight: A class that actually implements the abstract class/interface flyweight.
  3. Extrinsic State: State that is not common to the flyweights and varies per instance.
  4. Flyweight: An interface that defines and exposes the extrinsic state.
  5. Flyweight Factory: Class used by clients to retrieve instances of the flyweights.
  6. Intrinsic State: State that is common across all flyweights of the same type.

The Problem

Most of these design patterns posts deal with compile-time problems brought about by painting oneself into a corner with design decisions. In other words, if you make certain missteps when designing a system, it becomes rigid and difficult to change in the future. Patterns such as the Factories, Composite, Decorator, etc., all address prevention of such missteps by isolating responsibilities and making change management simple. Today, the problem we’re going to look at is a runtime problem rather than a compile-time problem.

Let’s say that you’re asked to create a model that simulates the distribution of mail in the USA for a given day. You start by establishing some pretty reasonable business objects to represent the domain concepts in question: letters and postcards. Each of these things has certain properties you’re going to need to keep track of, such as dimensions and postage, so that you can report in aggregate on paper usage, revenues, etc. Here is your first crack at business objects:

public abstract class MailPiece
{
    public abstract decimal Postage { get; set; }
    public abstract decimal Width { get; set; }
    public abstract decimal Height { get; set; }
    public abstract decimal Thickness { get; set; }
 
    public int DestinationZip { get; set; }
}
public class Letter : MailPiece 
{
    public override decimal Postage { get; set; }
    public override decimal Width { get; set; }
    public override decimal Height { get; set; }
    public override decimal Thickness { get; set; }
 
    public Letter()
    {
        Postage = 0.46M;
        Width = 11.5M;
        Height = 6.125M;
        Thickness = 0.25M;
    }
}
public class Postcard : MailPiece
{
    public override decimal Postage { get; set; }
    public override decimal Width { get; set; }
    public override decimal Height { get; set; }
    public override decimal Thickness { get; set; }
 
    public Postcard()
    {
        Postage = 0.33M;
        Width = 6.0M;
        Height = 4.25M;
        Thickness = 0.016M;
    }
}

Alright, awesome. Now, according to the USPS, there are 563 million mail pieces processed in the USA each day (as of 2011). So, let’s just use that as our loop counter, and off we’ll go:

static void Main(string[] args)
{
    var pieces = new List<MailPiece>();
    for (long index = 0; index < 563000000; index++)
    {
        pieces.Add(new Letter() { DestinationZip = (int)(index % 100000) });
    }
 
    Console.ReadLine();
}

Satisfied, you fire up the program to make sure everything here is in good working order–although, really, why bother? I mean, what could possibly go wrong? The program chugs along for a bit, and it strikes you that it’s taking a while and your desktop is starting to crawl. You fire up the process manager and see this:

Danger, Will Robinson

Yikes! That’s… troubling. You start to wonder if maybe you should intervene and stop the program, but Visual Studio isn’t being very responsive and you can’t even seem to get it to hit a breakpoint. Just as you’re contemplating drastic action, Visual Studio mercifully puts an end to things:

OutOfMemory

You’re not very close to getting all of the records in, so you decide to divide the original 563 million by 24 in order to model the mail being processed on an hourly basis, but that’s really all the more granular that you can do. You fire up the program again to run with 23,458,333 and hit out of memory once again.

So, What to Do?

If you look at your creation of letters (and presumably later, postcards) you should notice that there’s a fair bit of waste here. For every letter you create, you’re creating four decimals that are exactly the same for every letter, which means that you’re storing four unnecessary decimals for all but one of the letters you create, which in turn means that you’re storing somewhere in the neighborhood of two billion unnecessary decimal instances in memory. Whoah.

Let’s try a different approach. What we really seem to want here is 23 and a half million zip codes and a single letter. So let’s start off simply by building a list of that many zip codes and seeing what happens:

static void Main(string[] args)
{
    var pieces = new List<MailPiece>();
    for (long index = 0; index < 23458333; index++)
    {
        pieces.Add(new Letter() { DestinationZip = (int)(index % 100000) });
    }
 
    Console.ReadLine();
}

Woohoo! That runs, and for the first time we have something that doesn’t crash. But what we’re really looking to do is cycle through all of those mail pieces, printing out their zip, postage, and dimensions. Let’s start off by cheating:

static void Main(string[] args)
{
    var zipCodes = new List<int>();
    var letter = new Letter();
    for (int index = 0; index < 23458333; index++)
    {
        zipCodes.Add(index % 100000);
        Console.Write(string.Format("Zip code is {0}, postage is {1} and height is {2}",
            zipCodes[index], letter.Postage, letter.Height));
    }
 
    Console.ReadLine();
}

That gets us what we want, but there’s a lot of ugliness. The client class has to know about the exact mechanics of the mail piece’s printing details, which is a no-no. It also has the responsibility for keeping track of the flyweight letter class. There are better candidates for both of these responsibilities. First of all, let’s move the mechanism for printing information into the mail piece class itself.

public abstract class MailPiece
{
    public abstract decimal Postage { get; set; }
    public abstract decimal Width { get; set; }
    public abstract decimal Height { get; set; }
    public abstract decimal Thickness { get; set; }
 
    public string GetStatistics(int zipCode)
    {
        return string.Format("Zip code is {0}, postage is {1} and height is {2}",
            zipCode, Postage, Height);
    }
}

Notice that we’ve removed the settable “DestinationZip” and added a method that prints piece statistics, given a zip code. That allows us to simplify the client code:

static void Main(string[] args)
{
    var zipCodes = new List<int>();
    var letter = new Letter();
    for (int index = 0; index < 23458333; index++)
    {
        zipCodes.Add(index % 100000);
        string pieceStatistics = letter.GetStatistics(zipCodes[index]);
        Console.Write(pieceStatistics);
    }
 
    Console.ReadLine();
}

Asking the letter object for its statistics definitely feels like a better approach. It’s nice to get that bit out of the client implementation, particularly because it will now work for postcards as well and any future mail pieces that we decide to implement. But we’re still worrying about instance management of the flyweights in the client class, which really doesn’t care about them. Let’s introduce a new class:

public class MailPieceFactory
{
    private readonly Dictionary<char, MailPiece> _mailPieces = new Dictionary<char,MailPiece>();
 
    public MailPiece GetPiece(char key)
    {
        if (!_mailPieces.ContainsKey(key))
            _mailPieces[key] = BuildPiece(key);
 
        return _mailPieces[key];            
    }
 
    private static MailPiece BuildPiece(char key)
    {
        switch (key)
        {
            case 'P': return new Postcard();
            default: return new Letter();
        }
    }
}

Here we have a public method called “GetPiece” to which we pass a key indicating what sort of piece we want. As far as client code goes, that’s all that matters. Under the hood, we’re managing the actual flyweights. If the dictionary doesn’t have key we’re interested in, we build a piece for that key. If it does, we simply return the flyweight from the hash. (Note that the scheme of indexing them by characters will scale, but isn’t really optimal at the moment–you could use an enumeration here, restrict the set of allowed keys with preconditions, or have a special default scheme.)

Let’s see what the client code looks like.

static void Main(string[] args)
{
    var factory = new MailPieceFactory();
    var zipCodes = new List<int>();
 
    for (int index = 0; index < 23458333; index++)
    {
        zipCodes.Add(index % 100000);
        var randomPiece = factory.GetPiece(GetRandomKey());
        string pieceStatistics = randomPiece.GetStatistics(zipCodes[index]);
        Console.Write(pieceStatistics);
    }
 
    Console.ReadLine();
}
 
private static char GetRandomKey()
{
    return new Random().Next() % 2 == 0 ? 'P' : 'L';
}

There. Now the client code doesn’t worry at all about keeping track of the flyweights or about how to format the statistics of the mail piece. It simply goes through a bunch of zip codes, creating random pieces for them, and printing out their details. (It could actually dispense with the list of zip codes, as implemented, but leaving them in at this point underscores the memory benefit of the flyweight pattern.)

There’s now some nice extensibility here as well. If we implemented flats or parcels, it’d be pretty easy to accommodate. You’d just add the appropriate class and then amend the factory, and you’d be off and running. When dealing with scales of this magnitude, memory management is always going to be a challenge, so using a pattern up front that reuses and consolidates common information will be a big help.

A More Official Explanation

According to dofactory, the purpose of the Flyweight pattern is:

Use sharing to support large numbers of fine-grained objects efficiently.

The Flyweight pattern requires that you take the objects whose information you want to share for scaling purposes and divide their state into two categories: intrinsic and extrinsic. Intrinsic state is the state that will be shared across all instances of the same type of object, and extrinsic state is the state that will be stored outside of the instance containing common information. The reason for this division is to allow as much commonality as possible to be stored in a single object. In a way, it is reminiscent of simple class inheritance where as much common functionality as possible is abstracted to the base class. The difference here is that a single instance stores common information and an external interest holds the differences.

To make the most of the pattern, flyweight creation and management is abstracted into one class (the factory), intrinsic state to another set of classes (the concrete flyweights), and extrinsic state management to a third class (the client). Separating these concerns allows the client to incur only the memory overhead of the part of an object that varies while still reaping all of the benefits of OOP and polymorphism. This is a powerful optimization.

Other Quick Examples

  1. The iconic example of the Flyweight pattern is to have flyweights represent characters in a word processing document, with the extrinsic state of position in the document.
  2. Another use is rendering in graphical applications with many similar shapes or patterns on the screen in configurable locations.
  3. It can also be used to represent some fungible commodity of which there are many in a video game, such as enemies, soldiers, scenery elements, etc.
  4. One of the more interesting and practical examples is string “interning” in C#

A Good Fit – When to Use

As the cost of memory and disk space continues to plummet, emphasis on this pattern is seeming to wane a bit. However, there will always be occasions in which resource management and optimization are crucial, and bottlenecks will always exist where even liberally purchased resources need to be used wisely. The flyweight pattern is a good fit any time you have redundant, common state in many objects that you’re creating. If you find yourself in this position, the pattern is really a no-brainer since it’s simple and costs very little to implement in terms of code complexity.

Square Peg, Round Hole – When Not to Use

This isn’t the kind of pattern where you’re likely to go out looking for excuses to try it out since it solves a very specific problem that is different from a lot of other patterns. I would exercise caution more when choosing extrinsic and intrinsic state in that you’re going to experience downstream pain if you label things “intrinsic” that actually can vary. Doing this threatens to break the pattern and the abstraction that you’ve created since modifying state that’s categorized as intrinsic will mean that your flyweights are no longer interchangeable and you’re going to start getting stuff wrong.

So this isn’t a good fit if you don’t have a good grasp yet on what should be extrinsic and what should be intrinsic. It’s also a bad fit if all state is extrinsic, as then there is no gain from the pattern. Another potential misuse that’s always worth mentioning with an optimization strategy is premature optimization. While I personally consider the elimination of duplication at runtime simply to be good design, I’d say that you might want to consider not bothering with the additional compile-time complexity if you’re not going to have very many objects. If, for instance, you were simulating twenty pieces of mail instead of tens of millions, it might not be worth the additional complexity.

So What? Why is this Better?

This one pretty much sells itself. In the example above, you can run the code without crashing or altering your data structures. Generally speaking, this is a pattern that helps you reduce your application’s memory footprint simply by creating a clever abstraction and shuffling the internal representation. If you can identify when there will be common information in volume, you can separate it from the unique information and make sure you’re being as efficient as possible.

  • http://www.daedtech.com/blog Erik Dietrich

    Apologies to anyone who saw error messages about the Github Gist API instead of code snippets. I didn’t realize there was a cap on accessing gists programatically (there is: 60 per hour). I’ve switched back to using my syntax highlighting plugin to address this.