2. Selecting and Marking Fields and Methods

The selection of the fields stored inside the token and the call secured methods is an important step in the protection of an application. It is currently left to the developer of the application[6] and we provide some guidance in this choice below.

2.1. Fields

Selecting a field for storage in the secure coprocessor causes the translator to perform the following operations:

  1. The translator replaces the field by a pointer and inserts code in the constructor to allocate space in the memory of the coprocessor to hold the value of the field and its associated tag.

  2. The translator transforms instructions located around the accesses to the field so that they are executed by the coprocessor.

  3. The translator inserts instructions to exchange values between the main processor and the coprocessor where needed.

The transformations have two consequences on the application. On one hand, the application becomes potentially more difficult to crack. More information about its state and the operations it performs is hidden from a potential cracker. On the other hand, the application takes a performance hit. The coprocessor is slower than the main processor and the communication bus introduces latency when the main processor must retrieve a value from the coprocessor. A good selection must balance these two conflicting consequences. Below are some characteristics of fields that are good candidates for storage in the coprocessor.

A field that has a long lifetime relative to the runtime of the application and that is modified and used several times over this lifetime is a good candidate. The analysis of the values returned by the coprocessor is more difficult if set and get operations are dissociated and performed over a long period.

A field that contains a configuration value or a state variable of the application is a good candidate. Such a field is often assigned constant values that need not be transmitted at runtime. Instead the constants are encoded inside ciphered instructions at recompilation time and the instructions that set the value of the field are completely indistinguishable from other ciphered instructions. The change is not linked to the transmission of a given value to the coprocessor but to the execution of a given branch of the code. The value of such a field is also often used to drive the flow of control of the application. When a conditional branch uses the value of the field in a comparison, the translator relocates the comparison inside the coprocessor and its boolean result is the only information that is returned to the main processor.

A field that triggers the execution of heavy numerical operations in the coprocessor is likely to cause performance problems. However if a computation takes some parameters that can be precomputed from configuration values, these parameters and the code that precomputes them can be relocated in the coprocessor. The precomputed values are then retrieved and used in the heavy computation that is performed by the main processor.

There is no tool yet to evaluate automatically the quality of protection afforded by a given choice of secured fields. One key point must be kept in mind when trying to estimate this quality: The result of all the information hiding performed by the translator is embodied in the streams of values flowing from the coprocessor to the main processor as the application is executed. These streams are the values returned by the calls to the in method of the runtime. There is one stream for each thread of execution making such calls. A given selection of fields can provide a weak protection in two different cases:

  • A cracker can guess the future values in the streams by analyzing the streams produced by the application execution up to a given point in time. This indicates that the hidden part of the application is not complex enough. The log of the virtual coprocessor can be used to extract the streams of in values. The log record of a call to in is prefixed with the < character. It contains the retrieved value and the id of the thread in square brackets. The code of the ciphered instruction executed right before the call to in can be used to further classify the streams by in call site.

  • Altering the stream does not make the application unusable. This indicates that the fields hidden inside the coprocessor are not significant enough to the behavior of the application.

In the evaluation version of the Validy SoftNaos translator and virtual coprocessor package, only scalar fields of type byte, short, or int and Java 5 enumerated types can be stored inside the coprocessor. We plan to add support for the following types in the next versions of the tool.

  • Boolean, and possibly char types.

  • Long integers.

  • Arrays of the preceding types.

When a field of an enumerated type is stored inside the coprocessor, the value that is actually stored is the ordinal of the enum constant. The ordinal is the integer rank of the constant among all the constants defined for the enum type. If the field is null, a negative integer is stored in place of the ordinal. When an enum field is stored inside the coprocessor, the following operations are performed inside the token:

  • assignment of a constant or another enum local or field to the secured field

  • comparison of the secured field with a constant or another enum local or field

If the enum is used as the key in a switch statement, the ordinal is retrieved from the coprocessor and used in place of the constant. For operations that require access to the enum constant object itself, such as displaying the string representation of the constant, the ordinal is retrieved and the corresponding enum constant instance is used by the translator.

Warning

To simplify the association between storage locations and tags inside the coprocessor, the minimal unit of memory allocation is the 32 bit integer. A field of type byte or short stored inside the coprocessor occupies the same space as a field of type int.

Warning

One provision of the specification for the Java language in the binary compatibility chapter raises several difficulties for the translator. §13.4.26 Evolution of Enums states that “Adding or reordering constants from an enum type will not break compatibility with pre-existing binaries.” However, because the actual value stored inside the coprocessor is the ordinal of the enum, the constraint cannot be maintained and a class that contains a secured enumerated field must be compiled and run through the translator when enum constants are added, reordered or removed.

Another consequence of the same provision is that when an enum type is compiled, the Java compiler must synthesize special code to handle the mapping between constants and their ordinal. This code is not part of the specification of the Java language or the Java virtual machine and is thus compiler dependent. At this time, the Validy Technology translator supports and has been tested with bytecodes generated by Sun's javac compiler (version 1.5.x and 1.6.x) and by the Eclipse internal compiler (version 3.2.x). Please contact Validy if you need support for another compiler.

2.2. Methods

The current implementation of the translator does not automatically select methods for call protection. The developer must mark methods with the SecureMethod annotation or list them in a separate text file (see option markers).

Note

There is no need to mark the call sites. They are identified automatically by the translator.

Because methods are secured in groups (see Section 2.4, “Call protection”), the methods that can be selected must satisfy a few constraints.

  • All the call sites must be transformed by the translator; this rules out interfaces or virtual methods defined outside the application because all the call sites may not be accessible to the translator[7] .

  • Methods that are called directly by the virtual machine cannot be secured; this includes class constructors, finalize methods, writeObject/readObject methods used for serialization.

  • Methods called using reflection cannot be secured.

A slightly different constraint addresses how (rather than which) method groups should be marked so that all members of the group are identified by the translator. A method should always be marked in the highest[8] class or interfaces in which it is defined. Consider the following counterexample:

class A {

	public abstract void f();

}

class B extends A {

	@SecureMethod
	public void f() {
	}

}

class C extends A {

	public void f() {
	}

}

If C is processed before or independently from B, C.f will not be treated as secured and the application will fail if C.f is called through a reference to A. The same problem occurs if A is defined as an interface and the extends keywords are replaced by implements keywords.

Warning

Some method groups require several marks in order to be properly identified by the translator. In the example below, A.f and D.f belong to the same group because:

  • Method A.f and B.f can be called from the same site (invokeinterface I.f) since B.f is the implementation of I.f for class C.

  • Method B.f and D.f can be called from the same site (invokevirtual B.f).

Both SecureMethod annotations below are required.

interface I {

	@SecureMethod
	void f();

}


class A implements I {

	public void f() {
	}

}

class B {

	@SecureMethod
	public void f() {
	}

}

class C extends B implements I {

	public void f() {
	}

}

class D extends A {

	public void f() {
	}

}

Within the constraints exposed above, the only limit to the number of secured method calls is performance degradation. However, secured calls have less impact on performance than secured fields for the following reasons:

  • Secured calls do not allocate extra memory in the coprocessor.

  • Secured calls do not retrieve any value produced by the coprocessor. Their only potential effect is a failure of the application if it is tampered with.

2.3. Call Graphs

The selection of methods one at a time for call protection can be tedious. A call graph builder was developed to simplify the process. This tool computes call graphs starting from entry point methods, checks that the call graph will be linked to the original secured code, and provides the list of methods that have to be secured. This is a standalone tool but its output can be used directly as an input for the translator and the provided Ant tasks allow easy integration.

Using the secure call graph builder has the following advantages over manual selection of methods to secure:

multiplication

by selecting only one entry point, many methods are secured

connexity

at runtime, the selected methods automatically form a secured call stack rooted at the entry point

security

by default, the tool outputs only the methods of the call graphs that are actually linked to some of the original secured code. If they were not, all the code added to freeze the graph could be removed in one block without affecting the application. For a graph to be secure, the code of at least one of its methods must access a secured field, allocate an object with a secured field, or make a call to a secured API. A call graph that shares methods with another secure call graph is also secure.

The call graphs computed by the builder are geared to their use for call protection. Compared to complete Java call graphs, they have the following limitations:

  • Only explicit method calls are explored. Implicit calls such as calls to class initializers are not considered because they cannot be secured. The calling context is not completely under control of the translator so that a proper secured call context cannot be created.

  • Only methods in application classes are explored. Building the call graph further is unnecessary since the bytecode outside of application classes cannot or are not allowed to be modified to perform secured calls.

  • No type analysis is performed to prune the method implementations that cannot actually be called from a given site. This analysis could produce a more precise call graph but for call protection, all the methods that share at least one common call site must be secured in one group. Proving that a method implementation cannot be called from a given site is not enough. The correct verification would require a global analysis at the application level to show that the implementation is never called.

2.4. Annotations

The com.validy.technology.annotation package defines three annotation classes, SecureField, SecureMethod, and SecureCallGraph. To use these annotations, you must add the vldy-tech-annot.jar file to your class path. The last annotation is used only by the call graph builder, the translator simply ignores it and removes it from the bytecode it transforms.

The SecureField annotation takes one optional parameter called registerNumber. Instance fields have as many copies as there are objects of the class they belong to. They can only be allocated in the heap of the coprocessor and they are accessed using an offset from the k$this pointer. Static fields on the other hand are unique and can be allocated in one of two ways: in the heap or in a global register. The allocation of static fields in the heap is similar to that of an instance field. A chunk of coprocessor memory is allocated in the class constructor and stored in a static pointer called k$class. This is the default behavior. When a register number is specified, memory is not allocated and the value of the field is stored permanently in the given global register.

Storing a static field in a register has the following advantages:

  • Accesses to the field are slightly more efficient because stores and loads are not needed.

  • If all secured static fields of a class are put into registers, k$class is not defined. There is not hint that the class has secured static fields.

The characteristics of the coprocessor limit the number of static fields that can be allocated in global registers. The virtual coprocessor has 16 global registers numbered from 32 to 47. With the annotation below, the counter field of class A is allocated to register 40.

class A {

@SecureField(registerNumber=40)
private static int counter = 0;

...

}

The SecureField, SecureMethod, and SecureCallGraph annotations accept an optional boolean parameter called enabled whose default value is true. When this parameter is set to false, it instructs the translator not to secure the given field or method, or to output a list of methods not to secure in the case of the call graph.

The need to annotate an object to avoid protection seems paradoxal since objects that are not annotated at all are not secured. However using markers (see Section 2.5, “Markers”), it is possible to use automated tools to produce lists of fields or methods selected for protection and thus the same object may have several annotations. Adding a disabled annotation is useful when an automatically selected object must not be changed by the translator for a reason not taken into account by the tool. If an object has at least one disabled annotation, the translator will not change it.

For example, if the method g from class A is selected by a tool that automates call protection but must remain unchanged because it is called through reflection in some part of the application, the following annotation makes sure the translator does not apply call protection to it:

class A {

@SecureMethod(enabled=false)
public int g() {
...
}

...

}

2.5. Markers

This is an alternative method to using annotations and the only possible method for Java 1.4. It can also be used to accept the output of a tool that selects fields or methods automatically for protection.

To mark a field or a method, its containing class or interface name, its name and its signature must be listed on one line separated by spaces. The class name and the signature must follow the form defined in the Java Virtual Machine specification. Lines starting with a + character correspond to enabled annotations while lines starting with a - character correspond to disabled ones. If the + or - character is omitted, + is assumed by default.

Lines starting with a # character are ignored and can be used to add comments.

For example, to produce the same result as the following annotations:

package com.validy.sample;

import com.validy.technology.annotation.*;

public class Test {


	@SecureField
	private int value;


	@SecureMethod
	public final void f(String name);

	@SecureMethod(enabled=false)
	public int g();

}

the following lines should be added to the markers file:

+ com/validy/sample/Test value I
+ com/validy/sample/Test f (Ljava/lang/String;)V
# g is called through reflection in ...
- com/validy/sample/Test g ()I

2.6. Object Serialization

2.6.1. Introduction

Serialization is a way of flattening the state of objects to produce a data stream. This stream can then be stored on disk or transmitted over the network and later parsed to reconstruct a copy of the original objects. The Java language's builtin support for serialization is described in the Java Object Serialization Specification. Even though it can be customized by implementing special methods (writeObject and readObject), the support for serialization is largely declarative in nature:

  • classes whose instances need to be serialized are simply marked by having them inherit from the java.io.Serializable interface,

  • fields in these classes that should not be serialized are marked with the transient modifier.

The Java runtime is responsible for the actual process of serializing and deserializing object instances. It works by inspecting the definition of classes at runtime using reflection to decide what should be put in the output stream and what should be done with the content of the input stream. The declarative nature of Java serialization makes it easy to use even if its interaction with class evolution requires careful planning. The Validy SoftNaos translator strives to retain this simplicity. No extra work is required to support the serialization of classes with secured fields or methods. However in the absence of runtime support, the translator must use the customization hooks provided by the Java language to implement the secure serialization and deserialization of fields stored inside the token. The rest of this section presents this implementation and its limitations. It assumes some familiarity with Java object serialization.

2.6.2. Default Behavior

When one or more fields are marked for storage inside the secure token, the translator removes their definition from the class and they are replaced by a pointer to token memory named k$this[9]. The Java runtime does not have enough information to be able to serialize or deserialize the object. Therefore, the translator must provide special support for serialization by:

  • overriding the default definition of persistent fields through the declaration of a custom serialPersistentFields (or the modification of an existing declaration),

  • implementing custom writeObject and readObject methods (or modifying existing implementations).

When a secured field is serialized, its value is first loaded from the token memory to a token register, concatenated with a random nonce, checksummed, ciphered, and the resulting block is retrieved from the token to be put in the output stream. By default, when a field is deserialized, the ciphered block is sent to the token, deciphered, the checksum is verified, the nonce discarded, and if the value has not been tampered with, it is stored in the token memory at the proper location. The operations performed on the value of a field before serialization (nonce, checksum, cipher) are added to:

  • hide the actual value of the field when it is retrieved and stored outside of the token,

  • make it impossible to know whether a field has changed value or not between two serializations of the same object,

  • make it difficult for an attacker to tamper with the serialized value to try to alter the behavior of the secured application.

These operations use instructions from the token virtual machine and benefit from the same protection as other instructions generated by the translator (linked using tags and ciphered). The key used to cipher and decipher serialized values is stored inside the token at customization time and is independent from the one used to decipher instructions. Because of the nonce and checksum, a byte, short or int value that occupies one 32 bit word in the memory of the token is serialized as an 8 byte array.

2.6.3. Backward Compatibility

By default, streams of objects that were produced before the application was transformed and contain the value of secured fields in clear can be read but the secured fields are not initialized and trying to access them later on triggers an error in the secure token. There are two ways to handle this problem:

  1. a custom readObject method can be defined that initializes the secured fields with a default value before calling defaultReadObject.

  2. an option can be passed on the command line or as an attribute of the Ant task to change the default behavior. When this option is specified, the implementation of readObject is altered to accept either ciphered blocks or clear values for secured fields in the input stream. While ciphered blocks are handled as described above, clear values are transmitted to the token as is and stored directly in memory. This makes it possible to deserialize objects that were serialized before the application was transformed by the translator at the cost of extra storage[10].

Warning

When this option is used, the protection against tampering becomes ineffective since an attacker can always remove the ciphered block and set the clear value in the serialized stream before it is deserialized.

2.6.4. Caveats

When a field declared in a non serializable class is secured, the translator removes the declaration of this field from the bytecode of the application and all accesses to the field are performed by ciphered instructions of the virtual machine. The only remaining information is the presence of a k$this field that indicates the presence of at least one secured field in the class. However if the class is serializable, the name and type of the field are declared in the serialPersistentFields field and appear in the class constructor's bytecode.

When custom readObject and writeObject methods are defined, their special handling by the translator depends on locating the call to defaultReadObject or readFields and defaultWriteObject or putFields respectively. If one of these methods is not called, a warning is issued, the methods are handled normally, and the fields will be serialized in clear.

The current version of the translator has no support for externalization or secure custom serialization. If an application class implements java.io.Externalizable and its implementation of writeExternal reads the value of a secured field to stream it, the translator generates code that retrieves the value of the field in clear from the token. A future version of the translator may add support for an assembly-level interface to the virtual machine that would let the developer insert checksumming and ciphering instructions “by hand” but still have the translator do register allocation, set the right tag checks, and cipher the instructions.

The following missing functionalities may be added in future versions of the translator depending on demand:

  • possibility to remove the secure annotation on a field and still be able to read streams where its value was stored as a ciphered block,

  • control of code generation for backward compatibility at the class or field level using annotations,



[6] it is expected that future improved versions of the tools will be able to suggest or automatically select good candidates by analyzing the application.

[7] unless you are willing and authorized to distribute modified copies of third party libraries

[8] in the extends or implements relationship

[9] because it references an external resource (the secure token heap), k$this is marked with the transient modifier. The translator generates the code that recreates it upon deserialization.

[10] both the clear and the ciphered value of each secured field must be declared in serialPersistentFields for the application to be able to retrieve either the clear or the ciphered value of the field. Because of the way the Java runtime handles this declaration, both values will be stored in the output stream when an object is serialized by the transformed application. The clear value of the field will always be zero and the ciphered block will contain the correct value.