如何使用SAS软件高效率处理大数据

2019-04-16 张华 赵一鸣 临床流行病学和循证医学

最近,我参与一个大数据的处理与分析,作为一个不太专业的SAS使用者,在此过程中边查、边学、边用,在解决困难的过程中有了一些心得,分享给大家,希望对大家用所帮助。1.SAS是一般统计分析人员处理大数据的较好选择在处理大数据时,SPSS软件根本用不上,处理效率极低,而R软件处理数据时是先把数据读取到内存,因此当数据大小接近或超过内存大小时,R也无法应用,SAS是利用硬盘、运行效率较高的软件,且SAS软

最近,我参与一个大数据的处理与分析,作为一个不太专业的SAS使用者,在此过程中边查、边学、边用,在解决困难的过程中有了一些心得,分享给大家,希望对大家用所帮助。

1.SAS是一般统计分析人员处理大数据的较好选择

在处理大数据时,SPSS软件根本用不上,处理效率极低,而R软件处理数据时是先把数据读取到内存,因此当数据大小接近或超过内存大小时,R也无法应用,SAS是利用硬盘、运行效率较高的软件,且SAS软件较成熟,百度你遇到的问题一般可以找到答案,SAS还有一个强大的帮助系统。

2.测试集应用可检验程序是否正确,少做无用功

大数据的分析相当消费时间,可能计算一个频率需要半小时以上,而抽取一些case作为测试集,就相当于临床研究的预实验,可检验程序准确性,提高运行的效率。抽取数据这个有好多种方法常用的如使用obs=option选项,proc surveyselect进行分层抽样等。使用obs=option选项比较简单,如

optionsobs=1000;

proc freqdata=test;

table var;

run;

optionsobs=max;

3.仅保留需要的变量,压缩变量的长度

每个数据集最好只保留自己想要的变量,变量太多是会影响效率的,所以无关变量可以drop掉,或者keep想要的变量。在data步中在set后面的数据集进行keep、drop,效率更高。

Data a;

Set b(keep=);

Run;

同时变量长度会影响数据大小,可使用proc contents查看数据变量长度,使用input或proc sql修改变量长度。

4.先筛选,再分析,少排序

在对符合已知条件的记录进行处理时,先进行筛选,然后在进行分析。同时在进行的条件筛选中,where的效率要比if高,因为where在读入的时候就就进行判断,而if则是等到读完的时候再进行判断。对于分组变量最好是用class而不用by,因为用by是得对分组变量进行排序的。

5.修改label和格式通过procdatasets

在处理过程中,如需要增加或修改数据集变量的label和format格式时,通过proc datasets过程进行修改效率比较快,它不需要记录进入pdv,比起data步更有效率。

6.使用optionscompress=yes可节省硬盘空间

对于大数据集,一般运行时,缓存的大小可达到原数据大小的10倍甚至更高,可能导致硬盘空间不够用,所以一般需要数据集压缩,以节省存储空间,sas里可以通过options compress=yes来进行压缩。

7.使用sasfile data load可提高运行效率,但会占用大量内存

Sasfile dataload的应用是将数据存入内存,会减少读取次数,可提高运行效率,节省时间。但如果内存不够大,不建议使用此命令。

通过以上的处理,可以在大数据处理节省较多的时间和硬盘空间,希望对有这方面需要的老师有所帮助,大家有什么使用技巧,也欢迎留言分享哦。

版权声明:
本网站所有内容来源注明为“梅斯医学”或“MedSci原创”的文字、图片和音视频资料,版权均属于梅斯医学所有。非经授权,任何媒体、网站或个人不得转载,授权转载时须注明来源为“梅斯医学”。其它来源的文章系转载文章,或“梅斯号”自媒体发布的文章,仅系出于传递更多信息之目的,本站仅负责审核内容合规,其内容不代表本站立场,本站不负责内容的准确性和版权。如果存在侵权、或不希望被转载的媒体或个人可与我们联系,我们将立即进行删除处理。
在此留言
评论区 (4)
#插入话题
  1. [GetPortalCommentsPageByObjectIdResponse(id=372586, encodeId=e3c53e258659, content=写的很详细,感谢分析, beContent=null, objectType=article, channel=null, level=null, likeNumber=73, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdqq.qlogo.cn/qqapp/101296147/4937C3F0BF39173A8B5373A45C0B9D73/100, createdBy=36fe2290014, createdName=荞麦, createdTime=Wed Sep 11 18:30:47 CST 2019, time=2019-09-11, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=2007780, encodeId=8da6200e780b3, content=<a href='/topic/show?id=cf4d1590956' target=_blank style='color:#2F92EE;'>#SAS#</a>, beContent=null, objectType=article, channel=null, level=null, likeNumber=39, replyNumber=0, topicName=null, topicId=null, topicList=[TopicDto(id=15909, encryptionId=cf4d1590956, topicName=SAS)], attachment=null, authenticateStatus=null, createdAvatar=, createdBy=8e4c53, createdName=jiyangfei, createdTime=Mon Apr 22 01:30:00 CST 2019, time=2019-04-22, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=365366, encodeId=c3d6365366e1, content=收藏了,有时间可以学习、实操!, beContent=null, objectType=article, channel=null, level=null, likeNumber=65, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdwx.qlogo.cn/mmopen/vi_32/Qcj5xDGnJV65Un4lcVlusibuX88f17amqeZ9LnSuGRGvHxcSs1fMUKxal3Xuvgt5Rt5AHSeqRCINtCFGZ5qCyEA/132, createdBy=d8f42576630, createdName=yangtongtong, createdTime=Sat Apr 27 16:55:16 CST 2019, time=2019-04-27, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=364742, encodeId=4ddf364e42d6, content=谢谢分享, beContent=null, objectType=article, channel=null, level=null, likeNumber=63, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=https://thirdwx.qlogo.cn/mmopen/vi_32/DYAIOgq83eqYPFHuMA3bhYQrHvycmum7IJoCsLK0FP6X56Eoj9PZAECDke8zLCKxZZvk9q7B6rZmPuFgN5YCfw/132, createdBy=00912551888, createdName=12543c13m83暂无昵称, createdTime=Tue Apr 16 13:07:38 CST 2019, time=2019-04-16, status=1, ipAttribution=)]
    2019-09-11 荞麦

    写的很详细,感谢分析

    0

  2. [GetPortalCommentsPageByObjectIdResponse(id=372586, encodeId=e3c53e258659, content=写的很详细,感谢分析, beContent=null, objectType=article, channel=null, level=null, likeNumber=73, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdqq.qlogo.cn/qqapp/101296147/4937C3F0BF39173A8B5373A45C0B9D73/100, createdBy=36fe2290014, createdName=荞麦, createdTime=Wed Sep 11 18:30:47 CST 2019, time=2019-09-11, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=2007780, encodeId=8da6200e780b3, content=<a href='/topic/show?id=cf4d1590956' target=_blank style='color:#2F92EE;'>#SAS#</a>, beContent=null, objectType=article, channel=null, level=null, likeNumber=39, replyNumber=0, topicName=null, topicId=null, topicList=[TopicDto(id=15909, encryptionId=cf4d1590956, topicName=SAS)], attachment=null, authenticateStatus=null, createdAvatar=, createdBy=8e4c53, createdName=jiyangfei, createdTime=Mon Apr 22 01:30:00 CST 2019, time=2019-04-22, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=365366, encodeId=c3d6365366e1, content=收藏了,有时间可以学习、实操!, beContent=null, objectType=article, channel=null, level=null, likeNumber=65, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdwx.qlogo.cn/mmopen/vi_32/Qcj5xDGnJV65Un4lcVlusibuX88f17amqeZ9LnSuGRGvHxcSs1fMUKxal3Xuvgt5Rt5AHSeqRCINtCFGZ5qCyEA/132, createdBy=d8f42576630, createdName=yangtongtong, createdTime=Sat Apr 27 16:55:16 CST 2019, time=2019-04-27, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=364742, encodeId=4ddf364e42d6, content=谢谢分享, beContent=null, objectType=article, channel=null, level=null, likeNumber=63, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=https://thirdwx.qlogo.cn/mmopen/vi_32/DYAIOgq83eqYPFHuMA3bhYQrHvycmum7IJoCsLK0FP6X56Eoj9PZAECDke8zLCKxZZvk9q7B6rZmPuFgN5YCfw/132, createdBy=00912551888, createdName=12543c13m83暂无昵称, createdTime=Tue Apr 16 13:07:38 CST 2019, time=2019-04-16, status=1, ipAttribution=)]
    2019-04-22 jiyangfei
  3. [GetPortalCommentsPageByObjectIdResponse(id=372586, encodeId=e3c53e258659, content=写的很详细,感谢分析, beContent=null, objectType=article, channel=null, level=null, likeNumber=73, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdqq.qlogo.cn/qqapp/101296147/4937C3F0BF39173A8B5373A45C0B9D73/100, createdBy=36fe2290014, createdName=荞麦, createdTime=Wed Sep 11 18:30:47 CST 2019, time=2019-09-11, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=2007780, encodeId=8da6200e780b3, content=<a href='/topic/show?id=cf4d1590956' target=_blank style='color:#2F92EE;'>#SAS#</a>, beContent=null, objectType=article, channel=null, level=null, likeNumber=39, replyNumber=0, topicName=null, topicId=null, topicList=[TopicDto(id=15909, encryptionId=cf4d1590956, topicName=SAS)], attachment=null, authenticateStatus=null, createdAvatar=, createdBy=8e4c53, createdName=jiyangfei, createdTime=Mon Apr 22 01:30:00 CST 2019, time=2019-04-22, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=365366, encodeId=c3d6365366e1, content=收藏了,有时间可以学习、实操!, beContent=null, objectType=article, channel=null, level=null, likeNumber=65, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdwx.qlogo.cn/mmopen/vi_32/Qcj5xDGnJV65Un4lcVlusibuX88f17amqeZ9LnSuGRGvHxcSs1fMUKxal3Xuvgt5Rt5AHSeqRCINtCFGZ5qCyEA/132, createdBy=d8f42576630, createdName=yangtongtong, createdTime=Sat Apr 27 16:55:16 CST 2019, time=2019-04-27, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=364742, encodeId=4ddf364e42d6, content=谢谢分享, beContent=null, objectType=article, channel=null, level=null, likeNumber=63, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=https://thirdwx.qlogo.cn/mmopen/vi_32/DYAIOgq83eqYPFHuMA3bhYQrHvycmum7IJoCsLK0FP6X56Eoj9PZAECDke8zLCKxZZvk9q7B6rZmPuFgN5YCfw/132, createdBy=00912551888, createdName=12543c13m83暂无昵称, createdTime=Tue Apr 16 13:07:38 CST 2019, time=2019-04-16, status=1, ipAttribution=)]
    2019-04-27 yangtongtong

    收藏了,有时间可以学习、实操!

    0

  4. [GetPortalCommentsPageByObjectIdResponse(id=372586, encodeId=e3c53e258659, content=写的很详细,感谢分析, beContent=null, objectType=article, channel=null, level=null, likeNumber=73, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdqq.qlogo.cn/qqapp/101296147/4937C3F0BF39173A8B5373A45C0B9D73/100, createdBy=36fe2290014, createdName=荞麦, createdTime=Wed Sep 11 18:30:47 CST 2019, time=2019-09-11, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=2007780, encodeId=8da6200e780b3, content=<a href='/topic/show?id=cf4d1590956' target=_blank style='color:#2F92EE;'>#SAS#</a>, beContent=null, objectType=article, channel=null, level=null, likeNumber=39, replyNumber=0, topicName=null, topicId=null, topicList=[TopicDto(id=15909, encryptionId=cf4d1590956, topicName=SAS)], attachment=null, authenticateStatus=null, createdAvatar=, createdBy=8e4c53, createdName=jiyangfei, createdTime=Mon Apr 22 01:30:00 CST 2019, time=2019-04-22, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=365366, encodeId=c3d6365366e1, content=收藏了,有时间可以学习、实操!, beContent=null, objectType=article, channel=null, level=null, likeNumber=65, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=http://thirdwx.qlogo.cn/mmopen/vi_32/Qcj5xDGnJV65Un4lcVlusibuX88f17amqeZ9LnSuGRGvHxcSs1fMUKxal3Xuvgt5Rt5AHSeqRCINtCFGZ5qCyEA/132, createdBy=d8f42576630, createdName=yangtongtong, createdTime=Sat Apr 27 16:55:16 CST 2019, time=2019-04-27, status=1, ipAttribution=), GetPortalCommentsPageByObjectIdResponse(id=364742, encodeId=4ddf364e42d6, content=谢谢分享, beContent=null, objectType=article, channel=null, level=null, likeNumber=63, replyNumber=0, topicName=null, topicId=null, topicList=[], attachment=null, authenticateStatus=null, createdAvatar=https://thirdwx.qlogo.cn/mmopen/vi_32/DYAIOgq83eqYPFHuMA3bhYQrHvycmum7IJoCsLK0FP6X56Eoj9PZAECDke8zLCKxZZvk9q7B6rZmPuFgN5YCfw/132, createdBy=00912551888, createdName=12543c13m83暂无昵称, createdTime=Tue Apr 16 13:07:38 CST 2019, time=2019-04-16, status=1, ipAttribution=)]
    2019-04-16 12543c13m83暂无昵称

    谢谢分享

    0