Loading...
Please wait, while we are loading the content...
Processing and Storing Sparse Data in SAS ® Using SAS ® Text Miner Procedures
| Content Provider | Semantic Scholar |
|---|---|
| Author | Zhao, Zheng Albright, Russell Cox, James Allen |
| Copyright Year | 2014 |
| Abstract | Sparse data sets are common in applications of text and data mining, social network analysis, and recommendation systems. In SAS software, sparse data sets are usually stored in the coordinate list (COO) transactional format. Two major drawbacks are associated with this sparse data representation: First, most SAS procedures are designed to handle dense data and cannot consume data that are stored transactionally. In that case, the options for analysis are significantly limited. Second, a sparse data set in transactional format is hard to store and process in distributed systems. Most techniques require that all transactions for a particular object be kept together; this assumption is violated when the transactions of that object are distributed to different nodes of the grid. This paper presents some different ideas about how to package all transactions of an object into a single row. Approaches include storing the sparse matrix densely, doing variable selection, doing variable extraction, and compressing the transactions into a few text variables by using Base64 encoding. These simple but effective techniques enable you to store and process your sparse data in better ways. This paper demonstrates how to use SAS Text Miner procedures to process sparse data sets and generate output data sets that are easy to store and can be readily processed by traditional SAS modeling procedures. The output of the system can be safely stored and distributed in any grid environment. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://support.sas.com/resources/papers/proceedings14/SAS195-2014.pdf |
| Alternate Webpage(s) | https://support.sas.com/content/dam/SAS/support/en/technical-papers/data-text-mining/SAS195-2014.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |