陈俊帆
点赞:
陈俊帆
点赞:
论文
Self-Paced Pairwise Representation Learning for Semi-Supervised Text Classification
发布时间:2025-10-22点击次数:
发表刊物:
Proceedings of the ACM on Web Conference 2024 (WWW), CCF-A
摘要:
Text classification is one vital tool assisting web content mining. Semi-supervised text classification (SSTC) offers an approach to alleviate the burden of annotation costs by training on a few labeled texts alongside many unlabeled texts. Unsolved challenges in SSTC are the overfitting problem caused by the limited labeled data and the mislabeling problem of unlabeled texts. To address these issues, this paper proposes a Self-Paced Pair-Wise representation learning (SPPW) model. Concretely, SPPW alleviates the overfitting problem by replacing the overfitting-prone learning of a parameterized classifier with representation learning in a pair-wise manner. Besides, we propose a novel self-paced text filtering method that effectively integrates both label confidence and text hardness to reduce mislabeled texts synergistically. Extensive experiments on 3 benchmark SSTC datasets show that SPPW outperforms baselines and is effective in mitigating overfitting and mislabeling problems.
合写作者:
陈俊帆,张日崇, Jiarui Wang,胡春明, Yongyi Mao
论文类型:
国际学术会议
页面范围:
4352-4361
是否译文:
否
发表时间:
2024-01-01