Macromolecular self-assembly is at the basis of many phenomena in material and life sciences that find diverse applications in technology. One example is the formation of virus-like particles (VLPs) that act as stable empty capsids used for drug delivery or vaccine fabrication. Similarly to the capsid of a virus, VLPs are protein assemblies, but their structural formation, stability, and properties are not fully understood, especially as a function of the protein modifications. In this work, we present a data-driven modeling approach for capturing macromolecular self-assembly on scales beyond traditional molecular dynamics (MD), while preserving the chemical specificity. Each macromolecule is abstracted as an anisotropic object and high-dimensional models are formulated to describe interactions between molecules and with the solvent. For this, data-driven protein–protein interaction potentials are derived using a Kriging-based strategy, built on high-throughput MD simulations. Semi-automatic supervised learning is employed in a high performance computing environment and the resulting specialized force-fields enable a significant speed-up to the micrometer and millisecond scale, while maintaining high intermolecular detail. The reported generic framework is applied for the first time to capture the formation of hepatitis B VLPs from the smallest building unit, i.e., the dimer of the core protein HBcAg. Assembly pathways and kinetics are analyzed and compared to the available experimental observations. We demonstrate that VLP self-assembly phenomena and dependencies are now possible to be simulated. The method developed can be used for the parameterization of other macromolecules, enabling a molecular understanding of processes impossible to be attained with other theoretical models.