As artificial intelligence video synthesis and deepfake technology dominate headlines, new peer-reviewed research from 2024-2026 reveals a stark disconnect between marketing promises and technical reality. From text-to-video generation to deepfake detection systems, the latest studies expose fundamental limitations that practitioners and policymakers must understand.
With AI video generation tools like Sora, Runway, and Stable Video attracting billions in investment, understanding the actual capabilities and critical vulnerabilities has never been more important for businesses, educators, and digital security professionals.
Generative AI models show promise but face temporal consistency challenges
Recent advances in AI video synthesis have centred on diffusion models, which demonstrate superior performance over GANs for maintaining long-range temporal consistency. Research published in the International Journal of Interactive Multimedia and Artificial Intelligence shows these models can create realistic short-form content, with text-to-video capabilities improving significantly.
However, the technology faces persistent duration constraints. Most current systems remain limited to 5-10 second clips, with quality degradation over extended periods. Studies indicate that approximately 20% of generated videos require manual correction for physics violations and motion artifacts.
Neural Radiance Fields (NeRFs) integration represents a promising development, offering improved 3D consistency for video synthesis applications. Yet computational demands remain prohibitive, with high-quality generation requiring substantial infrastructure investments that limit widespread adoption.
Detection systems suffer catastrophic real-world performance drops
Perhaps the most alarming findings concern deepfake detection reliability. The Deepfake-Eval-2024 benchmark reveals that detection accuracy plummets by approximately 50% when moving from laboratory datasets to real social media content.
Human detection capabilities prove even worse. Research from iProov demonstrates that only 0.1% of people can correctly identify all deepfakes when specifically looking for them, with video deepfakes proving 36% harder to detect than manipulated images.
Current detection models face a fundamental generalisation crisis. Systems trained on specific deepfake generators fail catastrophically against new manipulation techniques, creating an ongoing arms race where generation capabilities consistently outpace detection methods.
Academic evaluation metrics mislead practitioners about real capabilities
A comprehensive survey of AI-generated video evaluation reveals significant problems with current benchmarking approaches. Academic datasets fail to reflect real-world conditions, creating false confidence in system performance metrics.
The research shows that models achieving over 90% accuracy on laboratory benchmarks often perform poorly on diverse, real-world content. This benchmark inflation problem means practitioners cannot rely on published performance figures when making implementation decisions.
Multiple evaluation metrics are required for comprehensive assessment, yet no unified framework exists for measuring practical utility. This fragmentation makes it nearly impossible for organisations to compare systems or predict real-world performance.
Commercial deployment outpaces technical readiness
Industry analysis reveals a troubling shift from capability development to premature monetisation. Companies promote AI video generation tools as “production-ready” despite fundamental technical limitations remaining unresolved.
The Coca-Cola holiday advertisement controversy exemplifies these issues—the AI-generated content was widely criticised as “soulless,” highlighting persistent problems with authentic human representation. Professional applications require significant human intervention, contradicting marketing claims about automation capabilities.
Resource inequality creates additional barriers. High-quality video generation demands computational resources unavailable to most practitioners, limiting accessibility despite commercial availability.
Regulatory frameworks lag behind technological capabilities
Legal and ethical research highlights significant gaps in current governance approaches. Regulatory frameworks remain far behind technological capabilities, creating vulnerabilities that bad actors can exploit.
Multi-dimensional safety assessments reveal that no single model excels across all risk categories, including violence prevention, misinformation reduction, and discrimination avoidance. Current safety measures prove inadequate for deployment at scale.
The research emphasises urgent needs for robust detection policies that don’t wait for perfect technology, digital literacy education about deepfake existence, and realistic guidelines based on actual capabilities rather than marketing claims.




Key takeaways for practitioners and organisations
For content creators and marketers: Budget for substantial post-production time when using AI video tools. Current technology works best for rough drafts, prototypes, and non-critical applications rather than finished professional content.
For security professionals: Implement multi-layered detection approaches combining multiple tools, as no single system provides reliable protection. Focus on user education and human oversight rather than purely technical solutions.
For researchers and students: Prioritise robustness over benchmark performance. Real-world effectiveness matters more than laboratory metrics, and the field desperately needs better generalisation research.
For policymakers: Develop detection policies immediately using available tools with human oversight, rather than waiting for perfect technology. Invest in digital literacy education and create realistic guidelines based on actual capabilities.
The path forward requires realistic expectations
The research reveals fundamental disconnects between academic progress claims and practical utility. While papers report incremental improvements, core challenges of temporal consistency, authentic human representation, and reliable detection remain largely unsolved.
Success requires acknowledging current limitations whilst investing in fundamental research rather than incremental optimisations. The field would benefit from focusing on robustness, safety, and real-world performance rather than pursuing benchmark improvements that don’t translate to practical applications.
As AI video generation technology continues evolving rapidly, understanding these research-backed realities becomes crucial for making informed decisions about implementation, security, and policy development.
References
Bougueffa, H., Keita, M., Hamidouche, W., Taleb-Ahmed, A., Liz-López, H., Martín, A., Camacho, D., & Hadid, A. (2024). Advances in AI-Generated Images and Videos. International Journal of Interactive Multimedia and Artificial Intelligence, 9(1), 173–208. https://doi.org/10.9781/ijimai.2024.11.003
Chandra, N. A., Murtfeldt, R., Qiu, L., Karmakar, A., Lee, H., Tanumihardja, E., Farhat, K., Caffee, B., Paik, S., Lee, C., Choi, J., Kim, A., & Etzioni, O. (2024). Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024. arXiv preprint arXiv:2503.02857. https://doi.org/10.48550/arXiv.2503.02857