STUDY DESIGN. An observational prospective cohort study. OBJECTIVES. To determine the reliability of nonorganic sign-testing in patients with chronic low back pain (CLBP), and to identify determinants of diagnostic disagreement. SUMMARY OF BACKGROUND DATA. For the assessment of behavioral responses to examination, Waddell et al published "the Waddell score" in 1980. The Waddell score consists of 8 nonorganic signs, divided into 5 categories. The overall score is positive if at least 3 of the categories are scored positive. Although the Waddell score is widely used, little is known about its reliability. METHODS. Two observers examined 126 consecutive patients with CLBP referred for rehabilitation. Cohen's κ was used to compute the interobserver and intraobserver reliability of the sign maneuvers, categories and Waddell score. Cronbach's α was calculated for the 5 categories and 8 signs to determine internal consistency. χ tests were applied to determine the possible influence of clinical characteristics on interobserver reliability. RESULTS. Interobserver reliability varied from 0.33 to 0.74 for the sign maneuvers and categories, and was 0.48 and 0.49 for the overall Waddell score. Intraobserver reliability varied from 0.43 to 0.84 for the sign maneuvers and categories, and was 0.65 and 0.68 for the overall Waddell score. Internal consistency varied from 0.65 to 0.72 for the categories and from 0.71 to 0.78 for the signs. Determinants of diagnostic disagreement did not exceed levels of significance (P < 0.05). CONCLUSION. For trained observers of a population of patients with CLBP in a rehabilitation setting, the interobserver reliability of the Waddell score was moderate and the intraobserver reliability was good. No influence of clinical characteristics was found on interobserver reliability. To optimize the homogeneity and variability of the Waddell score, we recommend summing up the individual signs instead of summing up the categories. © 2008 Lippincott Williams & Wilkins, Inc.