源码已上传,大家共同学习!

importrequestsimportjsonimportosimporttimeimportrandomimportjiebafromwordcloudimportWordCloudfromimageioimportimreadcomment_file_path =’jd_comments.txt’defget_spider_comments(page =0):#爬取某东评论url =’https://sclub.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98vv7990&productId=1070129528&score=0&sortType=5&page=%s&pageSize=10&isShadowSku=0&rid=0&fold=1’%page headers = {‘user-agent’:’Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36′,’referer’:’https://item.jd.com/1070129528.html’ }try: response = requests.get(url, headers = headers)except:print(“something wrong!”)#获取json格式数据集comments_json = response.text[26:-2]#将json数据集转为json对象 comments_json_obj = json.loads(comments_json)#获取comments里面的所有内容comments_all = comments_json_obj[‘comments’]#获取comments中评论content的内容forcommentincomments_all:withopen(comment_file_path,’a+’,encoding=’utf-8′)asfin:fin.write(comment[‘content’]+’n’)print(comment[‘content’])defbatch_spider_comments():# 每次写入数据之前先清空文件ifos.path.exists(comment_file_path): os.remove(comment_file_path)foriinrange(100):print(‘正在爬取’+str(i+1)+’页数据。。。。’) get_spider_comments(i)time.sleep(random.random()*5)defcut_word():withopen(comment_file_path,encoding=’utf-8′)asfile: comment_text = file.read() wordlist = jieba.lcut_for_search(comment_text)new_wordlist =’ ‘.join(wordlist)returnnew_wordlistdefcreate_word_cloud():mask = imread(‘ball.jpg’)wordcloud = WordCloud(font_path=’msyh.ttc’,mask = mask).generate(cut_word())wordcloud.to_file(‘picture.png’)if__name__ ==’__main__’: create_word_cloud()
谢谢大家支持,我会继续努力✊!!
本文来自投稿,不代表程序员编程网立场,如若转载,请注明出处:http://www.cxybcw.com/202455.html