微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

R中的多行文本摘要

如何解决R中的多行文本摘要

我有一组短文本文件,可以将它们组合成一个数据测试,以便每个文件都排成一行。

我正在尝试使用通用函数参数genericSummary(text,k,split=c(".","!","?"),min=5,breakdown=FALSE,...)使用LSAfun包来总结内容

这对于单个文本输入非常有效,但是就我而言,它不起作用。在包装说明中指出,文本输入应为“指定要汇总的文本的length(text)= 1的字符向量”。

请参见此示例

# Generate a dataset example (text examples were copied from wikipedia): 
 
dd = structure(list(text = structure(1:2,.Label = c("Forest gardening,a forest-based food production system,is the world's oldest form of gardening.[1] Forest gardens originated in prehistoric times along jungle-clad river banks and in the wet foothills of monsoon regions. In the gradual process of families improving their immediate environment,useful tree and vine species were identified,protected and improved while undesirable species were eliminated. Eventually foreign species were also selected and incorporated into the gardens.[2]\n\nAfter the emergence of the first civilizations,wealthy individuals began to create gardens for aesthetic purposes. Ancient Egyptian tomb paintings from the New Kingdom (around 1500 BC) provide some of the earliest physical evidence of ornamental horticulture and landscape design; they depict lotus ponds surrounded by symmetrical rows of acacias and palms. A notable example of ancient ornamental gardens were the Hanging Gardens of Babylon—one of the Seven Wonders of the Ancient World —while ancient Rome had dozens of gardens.\n\nWealthy ancient Egyptians used gardens for providing shade. Egyptians associated trees and gardens with gods,believing that their deities were pleased by gardens. Gardens in ancient Egypt were often surrounded by walls with trees planted in rows. Among the most popular species planted were date palms,sycamores,fir trees,nut trees,and willows. These gardens were a sign of higher socioeconomic status. In addition,wealthy ancient Egyptians grew vineyards,as wine was a sign of the higher social classes. Roses,poppies,daisies and irises Could all also be found in the gardens of the Egyptians.\n\nAssyria was also reNowned for its beautiful gardens. These tended to be wide and large,some of them used for hunting game—rather like a game reserve today—and others as leisure gardens. Cypresses and palms were some of the most frequently planted types of trees.\n\nGardens were also available in Kush. In Musawwarat es-Sufra,the Great Enclosure dated to the 3rd century BC included splendid gardens. [3]\n\nAncient Roman gardens were laid out with hedges and vines and contained a wide variety of flowers—acanthus,cornflowers,crocus,cyclamen,hyacinth,iris,ivy,lavender,lilies,myrtle,narcissus,poppy,rosemary and violets[4]—as well as statues and sculptures. Flower beds were popular in the courtyards of rich Romans.","The Middle Ages represent a period of decline in gardens for aesthetic purposes. After the fall of Rome,gardening was done for the purpose of growing medicinal herbs and/or decorating church altars. Monasteries carried on a Tradition of garden design and intense horticultural techniques during the medieval period in Europe. Generally,monastic garden types consisted of kitchen gardens,infirmary gardens,cemetery orchards,cloister garths and vineyards. Individual monasteries might also have had a \"green court\",a plot of grass and trees where horses Could graze,as well as a cellarer's garden or private gardens for obedientiaries,monks who held specific posts within the monastery.\n\nIslamic gardens were built after the model of Persian gardens and they were usually enclosed by walls and divided in four by watercourses. Commonly,the centre of the garden would have a reflecting pool or pavilion. Specific to the Islamic gardens are the mosaics and glazed tiles used to decorate the rills and fountains that were built in these gardens.\n\nBy the late 13th century,rich Europeans began to grow gardens for leisure and for medicinal herbs and vegetables.[4] They surrounded the gardens by walls to protect them from animals and to provide seclusion. During the next two centuries,Europeans started planting lawns and raising flowerbeds and trellises of roses. Fruit trees were common in these gardens and also in some,there were turf seats. At the same time,the gardens in the monasteries were a place to grow flowers and medicinal herbs but they were also a space where the monks Could enjoy nature and relax.\n\nThe gardens in the 16th and 17th century were symmetric,proportioned and balanced with a more classical appearance. Most of these gardens were built around a central axis and they were divided into different parts by hedges. Commonly,gardens had flowerbeds laid out in squares and separated by gravel paths.\n\nGardens in Renaissance were adorned with sculptures,topiary and fountains. In the 17th century,knot gardens became popular along with the hedge mazes. By this time,Europeans started planting new flowers such as tulips,marigolds and sunflowers."
),class = "factor")),class = "data.frame",row.names = c(NA,-2L))


# This code is trying to generate the summary into another column:

dd$sum = genericSummary(dd$text,k=1) 


这会导致错误Error in strsplit(text,split = split,fixed = T) : non-character argument

我相信这是由于使用了变量而不是单个文本

我的预期输出是使每一行的生成摘要位于相应的第二列dd $ sum

我尝试使用as.vector(dd$text),但这不起作用。 (我觉得它仍将输出合并为一行)。

我尝试从purrr中阅读一些有关map函数的信息,但在这种情况下无法应用,并且想知道是否有r编程经验的人可以提供帮助。

此外,如果您知道使用文本摘要包(例如lexrankr)来完成此部分的方法,那么这也将起作用。我从这里尝试了他们的代码,但仍然无法正常工作。 Text summarization in R language

谢谢

解决方法

选中class(dd$text)。这是一个因素,而不是一个字符。

以下作品:

library(dplyr)
library(purrr)
dd %>% 
  mutate(text = as.character(text)) %>%
  mutate(sum = map(text,genericSummary,k = 1))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。