View ACASVA Actions Dataset (public)

2012-09-24 19:37 by teo | Version 18 | Rating Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Overall (based on 0 votes)
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Interesting
Empty StarEmpty StarEmpty StarEmpty StarEmpty StarEmpty Star Documentation

HOG3D vectors extracted on the bounding boxes of players in videos of tennis and badminton. Time stamps are available for each vector. Created by T.deCampos, University of Surrey

action-recognition computer-vision HOG3D Label-Sequence-Learning Structured-Output-Prediction temporal-series time-series transfer-learning
Attribute Types
# Instances: 11003 / # Attributes: 960
tgz (61.1 MB)

Files are converted on demand and the process can take up to a minute. Please wait until download begins.

Completeness of this item currently: 100%.
You can edit this item to add more meta information and make use of the site's premium features.
Original Data Format
Version mldata
Data (first 10 data points)
    (Zipped) TAR archive ACASVA_actions, ACASVA_actions/, ACASVA_actions/, ACASVA_actions/, ACASVA_actions/, ACASVA_actions/TWDU06_960.tgz, ACASVA_actions/, ACASVA_actions/, ACASVA_actions/TWDU06_300.tgz, ACASVA_actions/, ACASVA_actions/, ACASVA_actions/ACASVA_actions.html, ACASVA_actions/, ACASVA_actions/

Following [deCampos et al, WACV2011], we used HOG3D descriptors extracted on player bounding boxes.

Two different sets of feature extraction parameters were used: the 960D parameters (4x4x3x20) optimised for the KTH dataset and the 300D parameters (2x2x5x5x3) optimised for the Hollywood dataset (see Alexander Klaser's page for details). In our preliminary experiments, we found that the KTH parameters (960D) give better results for the tennis dataset.

  • labels.txt: contains action labels; Non-Hit (0), Hit (1) and Serve (2);
  • frames.txt: for each sample, it indicates the time stamp of the original video when the features were extracted - note that multiple players are visible in each frame and for this reason consecutive lines have the same frame number;
  • teams.txt: represents players as Far player (0) and Near player (1) where far and near players are decided based on the player's feet position in relative to court's mid-line;
  • features.txt: contains feature vectors which has either 300 or 960 dimensional vectors, extracted using HOG3D - each line represents a feature vector for a action sample. The first element of each line indicates the dimensionality.
  • T. deCampos, M. Barnard, K. Mikolajczyk, J. Kittler, F. Yan, W. Christmas and D. Windridge. "An evaluation of bags-of-words and spatio-temporal shapes for action recognition". In IEEE Workshop on Applications of Computer Vision (WACV), 2011.

    Bags-of-visual-Words (BoW) and Spatio-Temporal Shapes (STS) are two very popular approaches for action recognition from video. The former (BoW) is an un-structured global representation of videos which is built using a large set of local features. The latter (STS) uses a single feature located on a region of interest (where the actor is) in the video. Despite the popularity of these methods, no comparison between them has been done. Also, given that BoW and STS differ intrinsically in terms of context inclusion and globality/locality of operation, an appropriate evaluation framework has to be designed carefully. This paper compares these two approaches using four different datasets with varied degree of space-time specificity of the actions and varied relevance of the contextual background. We use the same local feature extraction method and the same classifier for both approaches. Further to BoW and STS, we also evaluated novel variations of BoW constrained in time or space. We observe that the STS approach leads to better results in all datasets whose background is of little relevance to action classification.

  • N. FarajiDavar, T. deCampos, D. Windridge, J. Kittler and W. Christmas. "Domain Adaptation in the Context of Sport Video Action Recognition". In Domain Adaptation Workshop, in conjunction with NIPS, Sierra Nevada, Spain 2011.

    We apply domain adaptation to the problem of recognizing common actions between differing court-game sport videos (in particular tennis and badminton games). Actions are characterized in terms of HOG3D features extracted at the bounding box of each detected player, and thus have large intrinsic dimensionality. The techniques evaluated here for domain adaptation are based on estimating linear transformations to adapt the source domain features in order to maximize the similarity between posterior PDFs for each class in the source domain and the expected posterior PDF for each class in the target domain. As such, the problem scales linearly with feature dimensionality, making the video-environment domain adaptation problem tractable on reasonable time scales and resilient to over-fitting. We thus demonstrate that significant performance improvement can be achieved by applying domain adaptation in this context.

  • N. FarajiDavar, T. deCampos, J. Kittler and F.Yan. "Transductive Transfer Learning for Action Recognition in Tennis Games". In 3rd International Workshop on Video Event Categorization, Tagging and Retrieval for Real-World Applications (VECTaR), in conjunction with 13th Internatinal Conference on Computer Vision (ICCV), Barcelona, Spain 2011.

    This paper investigates the application of transductive transfer learning methods for action classification. The application scenario is that of off-line video annotation for retrieval. We show that if a classification system can analyze the unlabeled test data in order to adapt its models, a significant performance improvement can be achieved. We applied it for action classification in tennis games for train and test videos of different nature. Actions are described using HOG3D features and for transfer we used a method based on feature re-weighting and a novel method based on feature translation and scaling.

Data Source
HOG3D feature extraction method [Klaser et al, BMVC2008] applied to the space-time bounding box of players in videos of tennis and badminton.
Measurement Details

Performance is evaluated by using data from one of the sports video as training and another for testing, i.e., a whole file is used either for training, validation or testing, we do not encourage to use N-fold cross-validation. We encourage users to report results in terms average accuracy, but it may also be relevant to report True Positive, True Negative and False Positive rates for each of the classes. Area under the ROC curve has also been used.

Usage Scenario

Transductive transfer learning.

Each file contains HOG3D data extracted from one match of tennis or badminton. Because the conditions are different (illumination, athletes, clothes, etc), this dataset poses an interesting application scenario for transductive transfer learning methods. For example, a game of badminton can be used for training, a first game of tennis can be used for transfer (or domain adaptation) and a second game of tennis can be used as the test set.

An alternative usage scenario is to evaluate methods that are able to learn the structure of a temporal pattern. This link shows a typical sequence of actions in a tennis game.

Please refer to our publications for examples of experiments.

revision 1
by teo on 2012-03-07 12:07
revision 2
by teo on 2012-03-16 15:33
revision 3
by teo on 2012-03-16 15:35
revision 4
by teo on 2012-03-16 15:38
revision 5
by teo on 2012-03-16 15:42
revision 6
by teo on 2012-09-24 17:04
revision 7
by teo on 2012-09-24 17:18
revision 8
by teo on 2012-09-24 17:19
revision 9
by teo on 2012-09-24 17:20
revision 10
by teo on 2012-09-24 17:22
revision 11
by teo on 2012-09-24 17:23
revision 12
by teo on 2012-09-24 17:24
revision 13
by teo on 2012-09-24 17:32
revision 14
by teo on 2012-09-24 17:34
revision 15
by teo on 2012-09-24 17:36
revision 16
by teo on 2012-09-24 17:37
revision 17
by teo on 2012-09-24 17:50
revision 18
by teo on 2012-09-24 19:37

No one has posted any comments yet. Perhaps you would like to be the first?

Leave a comment

To post a comment, please sign in.

This item was downloaded 2636 times and viewed 20922 times.

No Tasks yet on dataset ACASVA Actions Dataset

Submit a new Task for this Data item


Sort by


We are acting in good faith to make datasets submitted for the use of the scientific community available to everybody, but if you are a copyright holder and would like us to remove a dataset please inform us and we will do it as soon as possible.

Data | Task | Method | Challenge


This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)