doi:10.3850/978-981-08-7300-4_0817
Framework for Preserving the Privacy of Data using Classification on Perturbed Data
P. Kamakshi1 and A. Vinaya Babu2
1Department of Information Technology, K.I.T.S,Warangal, India.
2Department of C.S.E, J.N.T.U., Kukatpally, Hyderabad, India
ABSTRACT
Data mining is a powerful means which supports automatic extraction of unknown patterns from large amounts of data. The useful knowledge extracted by data mining process support a variety of domains like marketing, weather forecasting, and medical diagnosis .The process of data mining requires a large data to be collected from various sites. With the rapid growth of the Internet, networking, hardware and software technology there is tremendous growth in the amount of data collection and data sharing. Huge volumes of detailed data are regularly collected from organizations and such datasets also contain personal as well as sensitive data about individuals. Traditionally, the data warehousing approach is used to collect and store the data from all the participating sites. Though the data mining operation extracts useful knowledge to support variety of domains but access to personal data poses a threat to individual privacy. There is increased concern on how sensitive and private information can be protected while performing data mining operation. Privacy preserving data mining algorithms gives solution for the privacy problem. The aim of these algorithms is to extract relevant knowledge from large amount of data and protect sensitive information at the same time. In this paper we analyzed the threats to privacy that can occur due to data mining process. We have proposed a framework that allows systematic transformation of original data using randomized data perturbation technique and the modified data is submitted as a result of query to the parties using decision tree approach. This approach gives the valid results for analysis purpose but the actual or true data is not revealed and the privacy is preserved. We also suggest a future model concept which can give better performance where the query results are released based on differentiating the data with different levels of privacy.
Keywords: Data perturbation, Data mining, Decision tree, Privacy preservation, Sensitive data.
Back to TOC
FULL TEXT(PDF)
|