Strings/StringBuffers - what should I use when, and why?

Java is still seen as slow by many people, and one reason for this is the large amount of badly written code around - often in tutorials etc. By far the most common performance-killer I've seen is the misuse of String concatenation. For example, suppose reader is an instance of BufferedReader - the following code will read every line of a file and put it all in a String:
String contents = "";
String line;

while ( (line=reader.readLine()) != null)
{
    contents += line+"\n";
}
Indeed, this will do the "right thing" in terms of end result - but it will take a lot of time and memory (temporarily) to do it. This is because the compiler will be turning the code above into something like:
String contents = "";
String line;

while ( (line=reader.readLine()) != null)
{
    contents = new StringBuffer (contents).append(line).append("\n").toString();
}
This is because Strings are immutable - the contents can't actually be changed. Applying += to a String doesn't add anything to the referenced String object - it assigns a reference to a new String object (which contains the original contents and whatever is on the right hand side of the +=). It typically (as in "in every compiler I've ever seen, and as suggested by the JLS") creates a new StringBuffer to build up the contents.

That means that with the above program, in memory you end up with each line of text being read (which is unavoidable) and the total contents read so far at every step. For example, if the file you're reading contains:

one
two
three
four
then after reading the first three lines, you'll have all of the following in memory:
one
one\n
two
one\ntwo\n
three
one\n\two\n\three\n
Fortunately StringBuffers and Strings share memory where possible, so although you'll actually have two objects in memory for each of the lines ending in \n, the character data for each will only appear once. However, the character data will have to be copied into the StringBuffer to start with.

Of course, all but the "current" String will be garbage collected at some point, but that means a lot of collection when you've got a large file. As a pretty graphic demonstration of this, consider the following short program:

public class Test
{
    public static void main (String [] args)
    {
        String s="";     
	for (int i=0; i < 100000; i++)
	    s+="x";
        System.out.println (s.length());
    }
}
Think about how much memory is being used in the above. The String concatenation will have copied 1 character in the first iteration, 2 in the second, etc - this works out at 5000050000 chars, or 9536Mb. That's just in terms of wasted character data - there are other overheads involved too, such as the redundant objects created in every iteration.

Now consider a version using StringBuffer explicitly:

public class Test
{
    public static void main (String [] args)
    {
        StringBuffer sb = new StringBuffer();
	for (int i=0; i < 100000; i++)
	    sb.append ("x");
	String s = sb.toString();
        System.out.println (s.length());
    }
}
StringBuffers grow when they need to, copying the old backing array into a new one. With Sun's JDK 1.3.1, for instance, the new backing array of chars is just over twice the size of the old array, and it starts off as 16 by default, so the above performs the following copies:
16->34 (copying 16 chars)
34->70 (copying 34 chars)
70->142 (copying 70 chars)
142->286 (copying 142 chars)
286->574 (copying 286 chars)
574->1150 (copying 574 chars)
1150->2302 (copying 1150 chars)
2302->4606 (copying 2302 chars)
4606->9214 (copying 4606 chars)
9214->18430 (copying 9214 chars)
18430->36862 (copying 18430 chars)
36862->73726 (copying 36862 chars)
73726->147454 (copying 73726 chars)

That's copied a total of 294850 chars, or ~600K in most JVM implementations. Not too bad. (There are also the 100000 chars copied in the append itself. The above just deals with the resizing.)

You can improve the above by specifying the size of the StringBuffer to start with, eg:

StringBuffer sb = new StringBuffer (100000);
- then you don't have any redundant copying!

If you still need any convincing, run both of the programs above. I suspect you'll get bored before the first version finishes - I know I always have.

When not to use StringBuffers

The above is gradually (too gradually, in my opinion) permeating the general Java consciousness. Unfortunately, it seems to be permeating in a reduced fashion - reduced to: "Don't use String concatention at all. Always use a StringBuffer." This is nonsense. I've seen code such as:
String x = new StringBuffer().append ("Hello ").
               append(name).append(". How are you?").toString();
which can be written much more readably as:
String x = "Hello "+name+". How are you?";
The second form will be converted into something akin to the first form by the compiler, so you don't need to worry about it - go for the more readable version.

If you have several statements such as:

String x = "Hello ";
x+=name;
x+=". How are you?";
there is a judgement call to be made. Using a StringBuffer would certainly be more efficient, but becomes slightly less readable. As there is a fixed number of copies here, you're not likely to get into a nasty situation taking up gigabytes of memory as in the example earlier. I'd probably leave it as it is, unless performance of this method becomes an issue - at which point it's an obvious optimisation to make.

General rules


Back to the main page.