Nowadays, the energy sector is experiencing a profound transition. Among all renewable energy sources, wind energy is the most developed technology across the world. To ensure the profitability of wind turbines, it is essential to develop predictive maintenance strategies that will optimize energy production while preventing unexpected downtimes. With the huge amount of data collected every day, machine learning is seen as a key enabling approach for predictive maintenance of wind turbines. However, most of the effort is put into the optimization of the model architectures and its parameters, whereas data-related aspects are often neglected. The goal of this paper is to contribute to a better understanding of wind turbines through a data-centric machine learning methodology. In particular, we focus on the optimization of data preprocessing and feature selection steps of the machine learning pipeline. The proposed methodology is used to detect failures affecting five components on a wind farm composed of five turbines. Despite the simplicity of the used machine learning model (a decision tree), the methodology outperformed model-centric approach by improving the prediction of the remaining useful life of the wind farm, making it more reliable and contributing to the global efforts towards tackling climate change.