AlphaFold3 at CASP16
The CASP16 experiment provided the first opportunity to benchmark AlphaFold3. In contrast to AlphaFold2, AlphaFold3 can predict the structure of non-protein molecules. According to the benchmark presented by the developers, it is expected to perform slightly better than AlphaFold2 for proteins. In this study, we assess the performance of AlphaFold3 using both automatic server submissions (AF3-server) and manual predictions from the Elofsson group (Elofsson). All predictions were generated via the AlphaFold3 web server, with manual interventions applied to large targets and ligands. Compared to AlphaFold2-based methods, we found that AlphaFold3 performs slightly better for protein complexes. However, when massive sampling is applied to AlphaFold2, the difference disappears. It was also noted that, according to the official ranking from CASP, the AF3-server performs better than AlphaFold2 for easier targets, but not for harder targets. Furthermore, the performance of the AF3-server is comparable to the best methods when considering the top-ranked predictions, but slightly behind when examining the best among the five submitted models. Here, there exist targets where AF3-server, the top-ranked method, is worse than lower-ranked models, indicating that a venue for progress could be to develop better strategies for identifying the best out of the generated models. When using AF3-server to predict the stoichiometry of larger protein complexes, the accuracy is limited, especially for heteromeric targets. When analyzing the predictions including nucleic acids, it was found that, in general, the accuracy is relatively low. However, the AF3-server performance was not far behind that of the top-ranked method. In summary, AF3-server offers a user-friendly tool that provides predictions comparable to state-of-the-art methods in all categories of CASP.

