In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to the documents to be archived is an interesting option in theory. It could save a lot of work in the time consuming and expensive task of manual annotation and it could help cataloguers attain a higher inter-annotator agreement. However, some questions arise in practice: what is the quality of the automatically produced annotations? How do they compare with manual annotations and with the requirements for annotation that were defined in the archive? If different from the manual annotations, are the automatic annotations wrong? In the CHOICE project, partially hosted at the Netherlands Institute for Sound and Vision, the Dutch public archive for audiovisual broadcasts, we automatically generate annotation suggestions for cataloguers. In this paper, we define three types of evaluation of these annotation suggestions: (1) a classic and strict evaluation measure expressing the overlap between automatically generated keywords and the manual annotations, (2) a loosened evaluation measure for which semantically very similar annotations are also considered as relevant matches, and (3) an in-use evaluation of the usefulness of manual versus automatic annotations in the context of serendipitous browsing. During serendipitous browsing, the annotations (manual or automatic) are used to retrieve and visualize semantically related documents. © Institute of Materials, Minerals and Mining 2009.