In order to create reliable results you need reliable data and reliable data is generated by following rigorous data curation processes. We outline a workflow using cheminformatics principles to help curate data and flag potentially erroneous data entries to ensure the highest quality model results in our recent publication in Nature in Chemical Biology.
Read more about our model here.
Fourches, D.; Muratov, E.; Tropsha, A. Curation of chemogenomics data. Nat. Chem. Biol. 2015, 11, 535.