如何解决剥离列表将返回空白值
嘿,我正在尝试从换行符中删除列表,但是我得到的输出为空。我究竟做错了什么?我正在使用jupyter来运行它
url = 'https://en.wikipedia.org/wiki/Microsoft_3D_Viewer'
paragraphs = []
titles = []
scraped_content = []
scraped_titles = []
scraped_list = []
response = requests.get(url,time.sleep(2))
soup2 = BeautifulSoup(response.content,"html.parser")
paragraphs = soup2.find_all('p')
lists = soup2.find_all('ul')
titles = soup2.find_all(re.compile('^h[1-4]$'))
for paragraph in paragraphs:
paragraphs = [paragraph.text]
paragraphs = paragraph.get_text()
scraped_content.append(paragraphs)
for title in titles:
titles = [title.text]
titles = title.get_text()
scraped_titles.append(titles)
scraped_content = list(map(str.strip,scraped_content))
scraped_content
解决方法
除了public class AutoMapperForCtorParamTest
{
private readonly IMapper mapper;
public AutoMapperForCtorParamTest()
{
mapper = new Mapper(new MapperConfiguration(
cfg =>
{
cfg.AddProfile<ModelMapperProfile>();
}));
}
[Fact]
public void MapAllProperties()
{
mapper.ConfigurationProvider.AssertConfigurationIsValid();
}
public class ModelMapperProfile : Profile
{
public ModelMapperProfile()
{
var defaultCode = Guid.NewGuid().ToByteArray();
CreateMap<Model,ModelDto>()
.ForCtorParam("type",cfg => cfg.MapFrom(x => 1))
.ForCtorParam("code",cfg => cfg.MapFrom<byte[]>(x => defaultCode))
.ForCtorParam("linkId",cfg => cfg.MapFrom<byte[]?>(x => null));
}
}
public class ModelDto
{
public ModelDto(int id,string name,int type,byte[] code,byte[]? linkId)
{
Id = id;
Name = name;
Type = type;
Code = code;
LinkId = linkId;
}
public int Id { get; }
public string Name { get; }
public int Type { get; }
public byte[] Code { get; }
public byte[]? LinkId { get; }
}
public class Model
{
public Model(int id,string name)
{
Id = id;
Name = name;
}
public int Id { get; }
public string Name { get; }
}
}
的参数外,您的代码看起来可以正常工作。
- 将第二个参数删除到
requests.get
,因为这是不必要的,并可能导致问题。如果要增加2秒钟的超时时间,请改用requests.get
as documented。 - 确保在一个单元中运行所有代码,以使所有数据都不会损坏。
- 如果您在运行Jupyter服务器的代理/防火墙后面,则可能会影响请求的结果。
- 无需声明
timeout=2
和paragraphs
之类的变量,然后立即重新分配它们。您可以使用列表推导来更直接地获得结果。
titles
输出(被截断,仅显示第一行):
from bs4 import BeautifulSoup
import requests
import time
import re
url = 'https://en.wikipedia.org/wiki/Microsoft_3D_Viewer'
response = requests.get(url)
paragraphs = []
titles = []
scraped_content = []
scraped_titles = []
scraped_list = []
soup2 = BeautifulSoup(response.content,"html.parser")
paragraphs = soup2.find_all('p')
lists = soup2.find_all('ul')
titles = soup2.find_all(re.compile('^h[1-4]$'))
scraped_content = [paragraph.get_text() for paragraph in paragraphs]
scraped_titles = [title.get_text() for title in titles]
trimmed_content = [content.strip() for content in scraped_content]
trimmed_content
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。