Reference Values in Java

Summary: We consider differences between the ways that Java handles primitive values (e.g., values of type int) and object values.

Prerequisites: Java basics. Primitive types. Object basics.

Introduction

As you may have noted, Java has two general “kinds” of values: primitive values, such as ints and doubles, and object values. At first glance, the two are quite similar. You can declare variables (or parameters or fields) using primitive types, and you can declare variables (or parameters or fields) using object types. You can assign to both primitive and object variables. You can use primitive and objects values as parameters.

  int i = 1;
  int j = i + 1;
  int k = j;
  BigInteger bi = BigInteger.ONE;
  BigInteger bj = bi.add(BigInteger.ONE);
  BigInteger bk = bj;

You may have also noted some subtle differences. For example, you normally need to use new when you need a new object value, but you don't use new for integers.

  int l = 5;
  BigInteger l = new BigInteger(5);

You may also have noted that we can use symbolic operators, such as + on primitive values, but can rarely use them on objects. (In a later reading, we'll explore a bit what's happening when we use symbolic operators with objects.)

However, when we start to look behind the scenes, Java treats variables that store primitive values and variables that store object values very differently. It's important to understand the difference because it affects the ways that programs behave.

Storing Values

When your program is running, the computer (well, the Java interpreter) needs to store almost every value that you work with. That means that it needs to reserve space for those values. For primitive types, the amount of space is both known and fixed. For example, the Java specification says that an int requires 32 bits (and uses two's complement representation) and a double requires 64 bits. Consider the following declarations.

  int i;
  int j;
  double d;
  int k;

The Java compiler will reserve 32 bits for i, 32 bits for j, 64 bits for d, and 32 bits for k, as in the following figure.

Note that in this diagram, we've explicitly shown that d requires twice as much memory as the other values. In future diagrams, we'll usually show all the cells as the same size, even though they hold different amounts size values.

Now, let's think about a simple Java class.

/**
 * A simple mechanism for tallying things.
 */
public class Tally
{
  /**
   * The current value of the tally.
   */
  int val;

  /**
   * Build a new tally with value 0.
   */
  public Tally()
  {
    this.val = 0;
  } // Tally()

  /**
   * Get the value of the tally.
   */
  public int get()
  {
    return this.val;
  } // get()

  /**
   * Tally something.
   */
  public void tally()
  {
    ++this.val;
  } // tally()
} // class Tally

Suppose we declare a variable of type Tally.

  Tally t;

How much space should we allot to t? At first glance, the answer seems easy. We need 32 bits for the val field of type int, and some amount of space (say 64 bits) to refer to the class information, such as the procedures.

Unfortunately, things aren't quite that simple. One of the important features of Java is that variables (and parameters and fields) are polymorphic in that they can take on multiple types of values, as long as those values are appropriately related. For example, someone might create a variant of Tally called MonitoredTally that keeps track not only of the tally, but also of the number of times that get is called. Since it's a variant of Tally, Java's polymorphism allows us to write

  t = new MonitoredTally();

Now, it's likely that MonitoredTally will have one more field than Tally, and will therefore need more space. Since we can't know in advance what kinds of objects are stored in an object variable, we can't allocate space for it when it is declared. So, what does Java do? Read on.

The Heap

Because objects take up different amounts of space, Java reserves a special part of memory for the objects that are created while the program is running. This special part of memory is called the heap. Each time you create a new object, whether directly, via a call to new, or indirectly, via some other procedure that builds an object, the Java interpreter allocates memory in the heap.

How are we able to allocate the right amount of memory on the heap when we couldn't figure out the right amount of memory when we declared a variable? Because at the time we create an object on the heap, we know exactly what object we are creating. In contrast, when we declare an object variable, we don't know exactly what kind of object it will store.

Given that we can't store objects directly in variables, what does an object variable store? It stores the address of the actual object in the heap. So, every object variable stores an address (which most Java programmers refer to as a reference). When we allocate an object, it gets an address. When we assign that object to a variable, we copy the address. Consider the following assignment statement.

  t = new Tally();

First, we create a new object on the heap. Let's say that it has memory address @4a8822a0. (The @ reminds us that it's an address. The 4a8822a0 is the actual address, in hexadecimal notation. Since the Java virtual machine does not necessarily allocate objects sequentially, we'll just show it in the diagram as a separate area of memory.

That's about it, except for one question that may be puzzling you: If each new object occupies memory on the heap, you may wonder why we don't run out of memory, given that we never explicitly get rid of objects. In fact, we can run out of memory. But Java interpreters are smart. They employ a garbage collector that identifies unused objects and frees the space associated with those objects. It turns out that some garbage collectors may even move objects around as they clean up garbage. Java conceals the locations of objects, in part, so that the garbage collector can more easily shuffle objects.

Implications

We started the reading noting that it's important to understand the use of references because it impacts the ways our programs behave. Let's explore what happens when we “copy” values. We'll start with primitive values.

  int i = 2;
  int j = i;

In this case, as in all of the subsequent cases, we declare two variables of the same type, and assign the value of one to the value of the other. For integers, we end up with a situation like the following.

Now, what happens if we change one of the two variables?

  i += 5;

We get the old value of i (2), add five, and then store it back in the same memory location. Since the two variables are independent, the instruction i, but does update j.

So, when we print out the two variables, we get different values.

  pen.println(i); // prints 7
  pen.println(j); // prints 2

None of that should have been surprising. Now, let's explore something similar with our Tally objects. Once again, we create two variables of the same type, initialize one, and assign it to the other.

  Tally t1 = new Tally();
  Tally t2 = t1;

In this case, Java assigns the address of the object, not the actual object. So, we now have two variables that reference the same objects.

Since both of them refer to the same object, they get the same value.

  pen.println(t1.get());        // Prints 0
  pen.println(t2.get());        // Prints 0

What happens when we change t1? In this case, we can change t1 by calling the tally method.

  t1.tally();

As you might expect, this changes the underlying object, not t1. So, we have almost exactly the same memory state as before.

Since the two variables share the same object, both of them have effectively been incremented.

  pen.println(t1.get());        // Prints 1
  pen.println(t2.get());        // Prints 1

Of course, that's not the only way we can change t1. We might also assign a new object to it.

  t1 = new Tally();

In this case, we've created a new object on the heap and assigned its address to t1. But t2 still holds the address of the first tally.

So, if we print the values of the tallies out, we'll get different results.

  pen.println(t1.get());        // Prints 0
  pen.println(t2.get());        // Prints 1

Now, let's turn our attention to strings. Although strings “feel” like primitive values, they are, in fact, objects, and behave just like objects.

  String s1 = "hello";
  String s2 = s1;

What happens if we assign a different string to s1?

  s1 = s1.concat(" world");

Strings are immutable, so the concat operator creates a new string, which gets a different address. Hence, s1 now refers to a different location in memory.

If we print the two values out, we'll see different output.

  pen.println(s1);      // prints hello world
  pen.println(s2);      // prints hello

In contrast to strings, StringBuffers are mutable. Hence, we'll sometimes see a change to one affect another.

  StringBuffer sb1 = new StringBuffer("hello");
  StringBuffer sb2 = sb1;

As you might expect, we currently have the following arrangement.

Now, let's modify sb1 by appending.

  sb1.append(" world");

Since the append procedure mutates the original buffer, we see a change to that area of memory.

And so the change to s1 affects s2.

  pen.println(sb1);     // prints hello world
  pen.println(sb2);     // prints hello world

Things to Remember

What should you take from this reading? We've stated at least three important lessons more or less explicitly:

Variables (and fields and parameters) that seem to hold objects don't really hold objects, they refer to objects.
Since such variables hold references, more than one variable can refer to the same object.
When you change an object through one variable, you may affect the value you see through other variables.

But there's one more lesson that's useful to remember: It can be very useful to sketch the layout of memory. (When you sketch, you may want to draw an arrow from a reference to the object it references.)