|
Detecting Human-object Interaction~(HOI) is foundamental for deeper visual understanding. Recent work has focused on the input of human pose and the design of graph neural network, and progress has been made in performance. However, these methods cannot adapt to the difficult pose input, which restricts the further development on them. To tackle this problem, this paper proposes a Hard Pose-aware Graph Embedding (HPGE) pipeline, which first encodes the pose features with a Pose Aggregation Network (PAN), and then models the difficult pose input with the proposed Feature Gate (FG) and Pose Component Augmentation (PCAug). FG designs a switch gate controlling the connection with human feature and pose feature, and closes the connection with pose features when the pose input cannot be relied; PCAug further exploits the potential of pose component features basing on the variance of pose component boxes with Gaussian distribution. The paper evaluates the proposed method on two recent datasets V-COCO and HICO-DET, and the experimental results show that the proposed FG and PCAug method improve the performance compared to the vanilla baseline, and with these methods HPGE can achieve the level of mainstream human-object interaction detection methods. Moreover, the paper also conducts ablation study, parameter analysis and model visualization of the HPGE pipeline. |
|
Keywords:artificial intelligence, human-object interaction, graph network, pose estimation |
|