Graph processing is one of the most important and ubiquitous classes of analytical workloads. To process large graph datasets with diverse algorithms, tens of distributed graph processing frameworks emerged. Their users are increasingly expecting high performance for diversifying workloads. Meeting this expectation depends on understanding the performance of each framework. However, performance analysis and characterization of a distributed graph processing framework is challenging. Contributing factors are the irregular nature of graph computation across datasets and algorithms, the semantic gap between workload-level and system-level monitoring, and the lack of lightweight mechanisms for collecting fine-grained performance data. Addressing the challenge, in this work we present Grade10, an experimental framework for fine-grained performance characterization of distributed graph processing workloads. Grade10 captures the graph workload execution as a performance graph from logs and application traces, and builds a fine-grained, unified workload-level and system-level view of performance. Grade10 samples sparsely for lightweight monitoring and addresses the problem of accuracy through a novel approach for resource attribution. Last, it can identify automatically resource bottlenecks and common classes of performance issues. Our real-world experimental evaluation with Giraph and PowerGraph, two state-of-the-art distributed graph processing systems, shows that Grade10 can reveal large differences in the nature and severity of bottlenecks across systems and workloads. We also show that Grade10 can be used in debugging processes, by exemplifying how we find with it a synchronization bug in PowerGraph that slows down affected phases by 1.10-2.50×. Grade10 is an open-source project available.