Background. Deviant shoulder girdle movement is suggested as an eminent factor in the etiology of shoulder pain. Reliable measurements of shoulder girdle kinematics are a prerequisite for optimizing clinical management strategies. Purpose. The purpose of this study was to evaluate the reliability, measurement error, and internal consistency of measurements with performance-based clinical tests for shoulder girdle kinematics and positioning in patients with shoulder pain. Data Sources. The MEDLINE, Embase, CINAHL, and SPORTDiscus databases were systematically searched from inception to August 2015. Study Selection. Articles published in Dutch, English, or German were included if they involved the evaluation of at least one of the measurement properties of interest. Data Extraction. Two reviewers independently evaluated the methodological quality per studied measurement property with the 4-point-rating scale of the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) checklist, extracted data, and assessed the adequacy of the measurement properties. Data Synthesis. Forty studies comprising more than 30 clinical tests were included. Actual reported measurements of the tests were categorized into: (1) positional measurement methods, (2) measurement methods to determine dynamic characteristics, and (3) tests to diagnose impairments of shoulder girdle function. Best evidence synthesis of the tests was performed per measurement for each measurement property. Limitations. All studies had significant limitations, including incongruence between test description and actual reported measurements and a lack of reporting on minimal important change. In general, the methodological quality of the selected studies was fair to poor. Conclusions. High-quality evidence indicates that measurements obtained with the Modified Scapular Assistance Test are not reliable for clinical use. Sound recommendations for the use of other tests could not be made due to inadequate evidence. Across studies, diversity in description, performance, and interpretation of similar tests was present, and different criteria were used to establish similar diagnoses, mostly without taking into account a clinically meaningful context. Consequently, these tests lack face validity, which hampers their clinical use. Further research on validity and how to integrate a clinically meaningful context of movement into clinical tests is warranted.