Proceedings of the
The Nineteenth International Conference on Computational Intelligence and Security (CIS 2023)
December 1 – 4, 2023, Haikou, China

MADV: A Framework for Copyright Protection of NLP Models Based on Multi-level Adversarial Samples

Suyu An1,a, Yangming Zhang1,b, Moxuan Zeng1,c, Yangzhong Wang1,d, Jun Niu1,e and Yuqing Zhang1,2,f

1College of Cyberspace Security, Hainan University, China.

2National Computer Network Intrusion Prevention Center, University of Chinese Academy of Sciences; School of Cyber Engineering, Xidian University, China

ABSTRACT

The current natural language processing (NLP) models can easily steal data by querying and extracting publicly available application program interfaces (APIs). However, most of the existing model protection methods are based on watermarking, which is mostly applicable to images and significantly impairs the model performance. In this paper, we propose a copyright authentication framework for NLP models based on multi-level adversarial samples: the MADV. the MADV quantitatively tests the similarity between two NLP models: the victim model and the suspicious model. Firstly, a set of test cases with adversarial samples of character-level, word-level and sentencelevel dimensions generated by the victim model are applied to the suspicious model to count the hit rate of the adversarial samples, and then the similarity threshold is compared to determine whether the suspicious model is a stealing model or not. We evaluate MADV on four text categorization datasets with four different model architectures. Experimental results show that MADV uses a smaller number of samples to achieve more stable copyright authentication results than the current state-of-the-art NLP modeling approach DRW, with no impairment in model performance.

Keywords: Model copyright, Model extraction, Natural language, Adversarial sample, Model watermarking.



Download PDF