刮皮剥离逗号

如何解决刮皮剥离逗号

skip legacy configuration under .ebextensions,put under .platform instead

所以我的选择器很好，他们得到了正确的字段。

在注释掉带状线的情况下运行时，我得到“ 型号＃，RA30 ”

然后，当我在未注释掉strip命令的情况下运行程序时，得到，RA30

我正在使用以下命令在终端import scrapy import pandas as pd from ..items import HomedepotpricespiderItem from scrapy.http import Request class HomedepotspiderSpider(scrapy.Spider): name = 'homeDepotSpider' allowed_domains = ['homedepot.com'] start_urls = ['https://www.homedepot.com/pep/304660691']#.format(omsID = omsID) #for omsID in omsList] def parse(self,response): #call home depot function for item in self.parseHomeDepot(response): yield item pass def parseHomeDepot(self,response): #get top level item items = response.css('#zone-a-product') for product in items: item = HomedepotpricespiderItem() #get SKU productSKU = product.css('.product-info-bar__detail:nth-child(2)::text').getall() #get rid of all the stuff i dont need #productSKU = [x.strip(' ') for x in productSKU] #whiteSpace #productSKU = [x.strip(',') for x in productSKU] #productSKU = [x.strip('\n') for x in productSKU] #productSKU = [x.strip('\t') for x in productSKU] #productSKU = [x.strip(' Model# ') for x in productSKU] #gets rid of the model name

中运行程序

我上面的输出是直接从CSV复制的

编辑*

我也尝试过

scrapy crawl homeDepotSpider -t csv -o - > "/Users/userName/Desktop/homeDepotv2Helpers/homeDepotTest.csv"

那是行不通的。这也是端子productSKU = [x.replace(',','') for x in productSKU]

的直接输出

解决方法

strip函数将只删除字符串开头或结尾的符号或子字符串。如果无论字符串中的任何位置都想删除字符，请使用replace函数。但是，如果您只想删除字符串开头或结尾的逗号，则应在productSKU = [x.strip(',') for x in productSKU]

之后再次重复行roductSKU = [x.strip(' Model# ') for x in productSKU] ,

您的选择器为您提供了两个元素的列表：['Model #','RA30']。

要仅获取SKU，只需使用索引：

productSKU = product.css('.product-info-bar__detail:nth-child(2)::text').getall()[1]

如果产品可能没有SKU，请确保正确处理异常。

您为什么不想使用XPath +正则表达式？

product_model = response.xpath('//h2[@class="product-info-bar__detail"][contains(.,"Model #")]/text()').re_first(r'#(.+)')

如何解决刮皮剥离逗号

解决方法

相关推荐