Artificial Intelligence VS Physical models for PV plants

Nowadays, there are two kinds of approaches to model the performances of a Photovoltaic (PV) plant:

  • domain-driven (mathematical-physical models);
  • data-driven approach (standard statistical or artificial intelligence models).


Physical models

The first kind of model, domain-driven, is related to the ability and expertise of the model-creator to mathematically describe the physical components of the plant, from the cell to the inverter. You can do this using manufacturer information to characterize the model parameters. The more specific domain knowledge, the more the plant model will be effective.

The PROs of physical models approach are the human-interpretability of the results, the capability of improve the models when the expertise increases and the possibility to create a model without archive data from the plant. For example, if the PV plant is not yet built or it is at a young stage of its operative life.

The CONs of the physical models approach are especially for large PV plants: it is not always possible to accurately describe all components, leading often to the necessity of post-processing model raw outputs to fine-tune the results. Example of this kind of approaches are “SAM”, “PVsyst”, PVLib.


Artificial Intelligence models

Data-driven approaches are typically based on standard statistical procedures such as polynomial regression, or more recent machine learning algorithms such as neural networks or decision trees.
These models simulate the behavior of the plant based on an archive of data used to assess the model parameters (in the context of machine learning, this step is called training of the model using the training dataset).

The PROs of the Artificial Intelligence models are that you can obtain good results using standard techniques, even when your staff have no expertise in that specific physics-domain, and that they can learn complex correlations between large amount of data with a low human effort.

The CONs of the Artificial Intelligence models are that in order to obtain reliable outputs, a suitable data archive is required in order to correctly compute the models’ parameters: the data archive must be large enough to be representative of all the working conditions (e.g. one year of data is the minimum needed to be able to follow the sun conditions) and must not contain anomalous data (for example, if periods of plant malfunctioning are in the training dataset of a machine learning algorithm, the system will “learn” as nominal these conditions, and the outputs will not be reliable).


At i-EM

We make best use of both strategies depending on the goal of the service.

For example, for the monitoring, diagnostics and performance analysis of the PV plants up to subcomponents, we prefer to use a domain-driven approach, because it allows to compare a physical model to the current behavior of the plant, and make us easy to understand discrepancies.
For Forecast or Nowcast activities, or when a detail description of the plant is not available, we exploit our Machine Learning capabilities.




Fabrizio Ruffini, PhD

Senior Data Scientist at i-EM