Lung diseases include some of the most widespread and deadly conditions known to affect people in the US today. One of the main challenges in treating lung disease is the difficulty of diagnosis. Clinical diagnosis remains largely dependent upon symptomatic-based diagnoses; many cases can be either misdiagnosed or undiagnosed until disease has progressed to a more severe stage. Most studies aimed at finding molecular-based diagnostics have focused on one or two diseases at a time, yielding limited success. Instead, we searched for biomarkers reflective of the global health state of the lung by studying data taken from a broad range of lung diseases. We used gene expression microarray data from five different lung diseases - lung adenocarcinoma, lung squamous cell carcinoma, malignant pleural mesothelioma, chronic obstructive pulmonary disease, and asthma - as well as a non-diseased phenotype, to train a classification tree scheme based on the Top Scoring Pair (TSP) algorithm (Geman et al., Stat Appl Genet Mol Biol. 2004; 3: Article 19). The algorithm identified 27 gene pair classifiers that classify the three cancers explicitly, and several of the markers have been previously cited in literature as linked to these cancers. Ten-fold cross validation yielded a classification accuracy of approximately 88%. Thus, a TSP-based classification tree scheme accurately identifies lung diseases from the relative expression of a few number of diagnostic gene pairs.