剥离列表将返回空白值

如何解决剥离列表将返回空白值

嘿，我正在尝试从换行符中删除列表，但是我得到的输出为空。我究竟做错了什么？我正在使用jupyter来运行它

url = 'https://en.wikipedia.org/wiki/Microsoft_3D_Viewer'
paragraphs = []
titles = []
scraped_content = []
scraped_titles = []
scraped_list = []

response = requests.get(url,time.sleep(2)) 
soup2 = BeautifulSoup(response.content,"html.parser")
paragraphs = soup2.find_all('p')
lists = soup2.find_all('ul')
titles = soup2.find_all(re.compile('^h[1-4]$'))
               
for paragraph in paragraphs:
    paragraphs = [paragraph.text]
    paragraphs = paragraph.get_text()
    scraped_content.append(paragraphs)
        
for title in titles:
    titles = [title.text]
    titles = title.get_text()
    scraped_titles.append(titles)
                       
scraped_content = list(map(str.strip,scraped_content))
scraped_content

解决方法

除了public class AutoMapperForCtorParamTest { private readonly IMapper mapper; public AutoMapperForCtorParamTest() { mapper = new Mapper(new MapperConfiguration( cfg => { cfg.AddProfile<ModelMapperProfile>(); })); } [Fact] public void MapAllProperties() { mapper.ConfigurationProvider.AssertConfigurationIsValid(); } public class ModelMapperProfile : Profile { public ModelMapperProfile() { var defaultCode = Guid.NewGuid().ToByteArray(); CreateMap<Model,ModelDto>() .ForCtorParam("type",cfg => cfg.MapFrom(x => 1)) .ForCtorParam("code",cfg => cfg.MapFrom<byte[]>(x => defaultCode)) .ForCtorParam("linkId",cfg => cfg.MapFrom<byte[]?>(x => null)); } } public class ModelDto { public ModelDto(int id,string name,int type,byte[] code,byte[]? linkId) { Id = id; Name = name; Type = type; Code = code; LinkId = linkId; } public int Id { get; } public string Name { get; } public int Type { get; } public byte[] Code { get; } public byte[]? LinkId { get; } } public class Model { public Model(int id,string name) { Id = id; Name = name; } public int Id { get; } public string Name { get; } } }的参数外，您的代码看起来可以正常工作。

将第二个参数删除到requests.get，因为这是不必要的，并可能导致问题。如果要增加2秒钟的超时时间，请改用requests.get as documented。
确保在一个单元中运行所有代码，以使所有数据都不会损坏。
如果您在运行Jupyter服务器的代理/防火墙后面，则可能会影响请求的结果。
无需声明timeout=2和paragraphs之类的变量，然后立即重新分配它们。您可以使用列表推导来更直接地获得结果。

titles

输出（被截断，仅显示第一行）：

from bs4 import BeautifulSoup
import requests
import time
import re
url = 'https://en.wikipedia.org/wiki/Microsoft_3D_Viewer'

response = requests.get(url) 
paragraphs = []
titles = []
scraped_content = []
scraped_titles = []
scraped_list = []
soup2 = BeautifulSoup(response.content,"html.parser")
paragraphs = soup2.find_all('p')
lists = soup2.find_all('ul')
titles = soup2.find_all(re.compile('^h[1-4]$'))

scraped_content = [paragraph.get_text() for paragraph in paragraphs]

scraped_titles = [title.get_text() for title in titles]

trimmed_content = [content.strip() for content in scraped_content]
trimmed_content

剥离列表将返回空白值

如何解决剥离列表将返回空白值

解决方法

相关推荐