1. 首页
  2. Python

python爬取搜索网址的a标签内容

“u003Cdivu003Eu003Cpu003E代码如下:u003Cu002Fpu003Eu003Cpreu003E#注意:本电脑环境是python 3.7u003Cbru003E#下面是导入相应模块u003Cbru003Eimport requests #导入requests库u003Cbru003Efrom bs4 import BeautifulSoup #导入解析库u003Cbru003Eimport pandas as pdu003Cbru003E#下面是网页请求u003Cbru003Eurl=”http:u002Fu002Fq.stock.sohu.comu002F” #设置请求网址为搜索网址u003Cbru003Eresponse=requests.get(url) #对搜狐网站就行get请求并将请求结果赋值给responseu003Cbru003Eresponse.encoding=’utf-8′ #设置编码为utf-8格式的u003Cbru003Ehtml=response.text #获取网页的html源代码并赋值给htmlu003Cbru003E#下面是网页解析u003Cbru003Esoup=BeautifulSoup(html,’lxml’) #将lxml解析为htmlu003Cbru003Econtent=soup.findAll(‘a’) #查找所有的a标签内容并赋值给contentu003Cbru003Efor aa in content: #遍历查到的的a标签内容u003Cbru003E print(aa.get(‘href’)) #获取a href后面的网址,并打印出来u003Cbru003E#下面是保存数据u003Cbru003Edf=pd.DataFrame(content,columns=[“网址”]) #设置列标为网址,单元格数据为content内容u003Cbru003Edf.to_excel(“搜索a标签内容.xlsx”) #将df数据存入搜索a标签内容.xlsx中u003Cbru003Eu003Cu002Fpreu003Eu003Cpu003E运行结果如下:u003Cu002Fpu003Eu003Cpu003Eu002Fu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fs.m.sohu.comu002Ftu002Findex.htmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Ffeedback.htmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fmystock.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fbk.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fph.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fzs.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Ffundflowu002Fu003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Franku003Cu002Fpu003Eu003Cpu003Eu002Fu002Fstock.sohu.comu002Fipou002Fu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fapp2u002Fbigdeal2.jspu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fapp2u002Frpsholder.upu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fapp2u002FmpssTrade.upu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fstock.sohu.comu002Fs2011u002Fjlpu002Fu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.fund.sohu.comu002Fjzphu002Fzxjz_date_up.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fusu002Fzgg.htmlu003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Ftransfer?page=callinu003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Ftransfer?page=callinu003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Ftransfer?page=calloutu003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Ftransfer?page=cancelu003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Ftransfer?page=recordu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fmp.sohu.comu003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fph_m.shtml?type=sh_as&field=changerate&sort=upu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fph_m.shtml?type=sz_as&field=changerate&sort=upu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fbk.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fbk.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fbk.shtmlu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fbk.shtmlu003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Eu002Fsdku002Franku003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fcnu002Fmystock.shtmlu003Cu002Fpu003Eu003Cpu003Ejavascript:void(0);u003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Ffundflowu002Fstock_inflow.html?name=NetVal&io=Inu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Ffundflowu002Fstock_inflow.html?name=NetVal&io=Outu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fapp2u002FmpssTrade.upu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fapp2u002FmpssTrade.upu003Cu002Fpu003Eu003Cpu003Eu002Fu002Fq.stock.sohu.comu002Fapp2u002Fbigdeal2.jspu003Cu002Fpu003Eu003Cpu003E图片示例如下:u003Cu002Fpu003Eu003Cdiv class=”pgc-img”u003Eu003Cimg src=”http:u002Fu002Fp3.pstatp.comu002Flargeu002Fpgc-imageu002F09c326cdc1f848e4b2d73a04dc277f72″ img_width=”1205″ img_height=”988″ alt=”python爬取搜索网址的a标签内容” inline=”0″u003Eu003Cp class=”pgc-img-caption”u003Eu003Cu002Fpu003Eu003Cu002Fdivu003Eu003Cu002Fdivu003E”

原文始发于:python爬取搜索网址的a标签内容

主题测试文章,只做测试使用。发布者:~那﹑男人是我的命﹪,转转请注明出处:http://www.cxybcw.com/13052.html

联系我们

13687733322

在线咨询:点击这里给我发消息

邮件:1877088071@qq.com

工作时间:周一至周五,9:30-18:30,节假日休息

QR code