Long-term Video Object Segmentation
LVOS is the first densely annotated long-term video object segmentation dataset. LVOS aims to provide a benchmark for the development and assessing of long-term VOS models.
Each sequence lasting 1.14 minutes on average.
Pixel-wise annotations are available at 6 fps.
All videos are 720P .
44 categories with 12 unseen categories to simulate real application scenarios closely.
Each sequence is provided with additional lingual label .
720 videos. 407,945 annotations.
Please consider citing LVOS if you use LVOS in your research.
# for LVOS V2
@article{hong2024lvos,
author = {Hong, Lingyi and Liu, Zhongying and Chen, Wenchao and Tan, Chenzhi and Feng, Yuang and Zhou, Xinyu and Guo, Pinxue and Li, Jinglun and Chen, Zhaoyu and Gao, Shuyong and others},
title = {LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation},
journal = {arXiv preprint arXiv:2404.19326},
year = {2024},
}
# for LVOS V1
@InProceedings{Hong_2023_ICCV,
author = {Hong, Lingyi and Chen, Wenchao and Liu, Zhongying and Zhang, Wei and Guo, Pinxue and Chen, Zhaoyu and Zhang, Wenqiang},
title = {LVOS: A Benchmark for Long-term Video Object Segmentation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {13480-13492}
}
The annotations of LVOS are licensed under a Creative Commons Attribution 4.0 License .
The evaluation toolkits of LVOS are licensed under a BSD-3-Clause license .
The data of LVOS is released for non-commercial research purpose only.
All videos and images are from VOT-LT 2019, LaSOT , and some other datasets, which are not property of Fudan. Fudan is not responsible for the content nor the meaning of these videos and images.
Any questions, suggestions and feedback are welcomed. Please concat honglyhly@gmail.com