求助大佬使用scrapy 的问题

xiaoyuvps

def parse(self, response):
      #print(response.body)
      html = response.body
      print(html)
      print("----------数据处理----------1")
      #conn = request.urlopen(response.url)
      #doc = etree.HTML(html.read())
      listExtra = GsExtractor()
      listExtra.setXsltFromFile("/home/list.xml")
      result = listExtra.extract(html)
      #print(str(result).encode('gbk', 'ignore').decode('gbk'))
      print("----------数据处理----------2")
      print(result)
      request = scrapy.Request(result, callback=self.parse_item)
      yield request

我自己写的小的爬虫正常工作。现在想把自己写的爬虫放到scrapy里来用（入库下载各方面会更简单一些）

但是遇到了一些问题
html = response.body 是直接解析出来html
   #conn = request.urlopen(response.url)
      #doc = etree.HTML(html.read())

而之前是这种形式出来的是
<Element html at 0x7fb650f62f88>

请问这两个怎么有什么区别怎么处理？

xiaoyuvps · 发表于 2017-5-20 15:47:28

如果实在不行。我就放弃用gooseeker的选择器了。回去继续用xpath了。

Fuller · 发表于 2017-5-20 16:36:43

上面代码中，html是文本，而通过etree解析以后变成一种结构，类似浏览器内存中的DOM结构

xiaoyuvps · 发表于 2017-5-20 19:02:39

Fuller 发表于 2017-5-20 16:36
上面代码中，html是文本，而通过etree解析以后变成一种结构，类似浏览器内存中的DOM结构
...

SCRPAY里如何才能变成DOM结构？

xiaoyuvps · 发表于 2017-5-20 19:09:00

处理出来了。。感谢

求助大佬使用scrapy 的问题

共 4 个关于本帖的回复最后回复于 2017-5-20 19:09

推荐板块

精彩推荐

热门话题

热门用户

	B Color Image Link Quote Code Smilies 高级模式您需要登录后才可以回帖登录 \| 立即注册回帖并转播回帖后跳转到最后一页

求助大佬 使用scrapy 的问题

共 4 个关于本帖的回复 最后回复于 2017-5-20 19:09

推荐板块

精彩推荐

热门话题

热门用户

求助大佬使用scrapy 的问题

共 4 个关于本帖的回复最后回复于 2017-5-20 19:09