This question must be addressed within the context of all issues that determine the worthiness of every functional evaluation procedure endorsed or recognized as crucial by the major professional associations and government bodies that have addressed the issue of functional evaluation, such as the American Psychological Association (American Psychological Association, American Educational Research Association, & National Council on Measurement in Education, 1999), the National Institute of Occupational Safety and Health (Chaffin, Herrin, & Keyserling, 1978), the American Physical Therapy Association (American Physical Therapy Association, 1997), the American Congress of Rehabilitation Medicine (Johnston, Keith, & Hinderer, 1992) , and the American College of Sports Medicine (American College of Sports Medicine, 2000) . These issues, presented in hierarchical order, are:

1. Safety – Given the known characteristics of the evaluee, the procedure should not be expected to lead to injury;
2. Reliability – The test score should be dependable across evaluators, evaluees, and the date or time of test administration;
3. Validity – The interpretation of the test score should be able to predict or reflect the evaluee’s performance in a target setting. The formal definition of validity is “the degree to which all of the accumulated evidence supports the intended interpretation of test scores for the intended purpose”. [3]
4. Practicality -The cost of the test procedure should be reasonable and customary. Cost is measured in terms of the direct expense of the test procedure plus the amount of time required of the evaluee;
5. Utility – The usefulness of the procedure is the degree to which it meets the needs of the evaluee and referrer.

This hierarchy requires that each of the factors presented earlier must be maintained as subsequent factors are addressed. For example, it is not permissible to sacrifice safety for the sake of practicality. In addition, the first four factors must be adequately addressed for the purpose of the evaluation to be achieved.
When applied to work disability, functional capacity evaluation has three primary purposes. The first purpose is to determine whether or not the evaluee is able to return to work at his or her usual and customary job and, if not, to identify what the evaluee needs to improve or the employer needs to modify before a return to work is reasonable. If there is not a job to which the evaluee can return, the second purpose is to identify functional abilities that could be used in alternate occupations. The third purpose is to quantify functional limitations in terms that are useful in the disability determination process.

By what standard can each of the individual factors be measured? In the past, the answer to this question was readily available, with simple statistical standards used for each factor. However, since the 1999 publication of the APA/AERA/NCME Standards and the adoption of the new definition of “validity” in the Standards by the United States Equal Employment Opportunity Commission (EEOC, 2008), a simple answer is no longer acceptable. Best practices in evaluation require a broad look at the test factors hierarchy, with a focus on validity that must now take into account a broad range of issues with the overriding goal being increased fairness in the evaluation process.

The reader is encouraged to obtain the APA/AERA/NCME standards for specific information. In the meantime, one approach to the consideration of the adequacy of the standards that can be considered is to rely on whether utility was achieved. Given that utility is the ultimate factor, requiring that all of the other factors be adequately addressed, if there is utility the factors in the hierarchy must have been handled appropriately. If there is not utility, one or more of the factors was not handled appropriately.

What this means in practice is that we must always start by asking whether there was a useful outcome. For example, in the 1986 workers’ compensation study (Matheson, 1986), functional capacity evaluation results were helpful in resolving several cases, after each injured worker had received a permanent and stationary rating with work restrictions. Although individual indicators for each of the factors in the hierarchy were available and presented in the original research, the ultimate determination of test worthiness is that the FCE process led to a useful outcome. Vert Mooney, M.D. is fond of saying that “the test result has got to tell you what to do next”. There are many tests that are reliable and valid in and of themselves that don’t guide the professionals in terms of what to do next and don’t have utility in terms of contributing to a useful outcome. Adoption of this broader view of the properties of tests, beyond their psychometric properties, is likely to lead to a new generation of functional capacity evaluation tests and test batteries that are more likely to meet the needs of evaluees and referral sources.

American College of Sports Medicine. (2000). Guidelines for exercise testing and prescription (6th ed.). New York: Lippincott Williams & Wilkins.
American Physical Therapy Association. (1997). Occupational health guidelines: Evaluating functional capacity. Alexandria, VA: American Physical Therapy Association.
American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.
Chaffin, D., Herrin, G., & Keyserling, W. (1978). Preemployment strength testing. An updated position. J Occup Med, 20(6), 403-408.
EEOC, U. S. E. E. O. C. (2008). Employment tests and selection procedures. Retrieved September 20, 2009
Johnston, M., Keith, R., & Hinderer, S. (1992). Measurement standards for interdisciplinary medical rehabilitation. Archives of Physical Medicine Rehabilitation, 73, S3-S23.
Matheson, L. (1986). Evaluation of lifting and lowering capacity. Vocational Evaluation and Work Adjustment Bulletin, 19(4), 107-111.