There's more...

The performance estimate depends on the data used. Therefore, simply dividing data randomly into a training and a testing set does not guarantee that the results are statistically significant. The repetition of the evaluation on different random divisions and the calculation of the performance in terms of the average and standard deviation of the individual evaluations creates a more reliable estimate.

However, even the repetition of evaluations on different random divisions could prevent the most complex data being classified in the testing (or training) phase.