Proceedings of the
The Nineteenth International Conference on Computational Intelligence and Security (CIS 2023)
December 1 – 4, 2023, Haikou, China
MADV: A Framework for Copyright Protection of NLP Models Based on Multi-level Adversarial Samples
1College of Cyberspace Security, Hainan University, China.
2National Computer Network Intrusion Prevention Center, University of Chinese Academy of Sciences; School of Cyber Engineering, Xidian University, China
ABSTRACT
The current natural language processing (NLP) models can easily steal data by querying and extracting publicly available application program interfaces (APIs). However, most of the existing model protection methods are based on watermarking, which is mostly applicable to images and significantly impairs the model performance. In this paper, we propose a copyright authentication framework for NLP models based on multi-level adversarial samples: the MADV. the MADV quantitatively tests the similarity between two NLP models: the victim model and the suspicious model. Firstly, a set of test cases with adversarial samples of character-level, word-level and sentencelevel dimensions generated by the victim model are applied to the suspicious model to count the hit rate of the adversarial samples, and then the similarity threshold is compared to determine whether the suspicious model is a stealing model or not. We evaluate MADV on four text categorization datasets with four different model architectures. Experimental results show that MADV uses a smaller number of samples to achieve more stable copyright authentication results than the current state-of-the-art NLP modeling approach DRW, with no impairment in model performance.
Keywords: Model copyright, Model extraction, Natural language, Adversarial sample, Model watermarking.

Download PDF
1College of Cyberspace Security, Hainan University, China.
2National Computer Network Intrusion Prevention Center, University of Chinese Academy of Sciences; School of Cyber Engineering, Xidian University, China
ABSTRACT
The current natural language processing (NLP) models can easily steal data by querying and extracting publicly available application program interfaces (APIs). However, most of the existing model protection methods are based on watermarking, which is mostly applicable to images and significantly impairs the model performance. In this paper, we propose a copyright authentication framework for NLP models based on multi-level adversarial samples: the MADV. the MADV quantitatively tests the similarity between two NLP models: the victim model and the suspicious model. Firstly, a set of test cases with adversarial samples of character-level, word-level and sentencelevel dimensions generated by the victim model are applied to the suspicious model to count the hit rate of the adversarial samples, and then the similarity threshold is compared to determine whether the suspicious model is a stealing model or not. We evaluate MADV on four text categorization datasets with four different model architectures. Experimental results show that MADV uses a smaller number of samples to achieve more stable copyright authentication results than the current state-of-the-art NLP modeling approach DRW, with no impairment in model performance.
Keywords: Model copyright, Model extraction, Natural language, Adversarial sample, Model watermarking.

Download PDF
