如何解决我有一个包含 700 个条形的 ggplot 条形图,我想要一条帕累托线,它可以工作,但是条形的 y 比例太小,所以它们没有显示在图表中
多亏了很多人,我才开始使用 R 的图表。
I have three charts
绘图频率排序
绘制帕累托叠加
如果你仔细观察,你会看到底部有缩放的有序频率图。
```{r}
df <- filter(df_clean_distances,end_station_name != "NA" )
d <-df %>% select( end_station_name) %>%
group_by(end_station_name) %>%
summarize( freq = n())
head(d$freq )
dput(head(d))
d2 <- d[ order(-d$freq),]
d2
随机绘图
```{r}
ggplot(d2,aes( x=end_station_name,y= freq)) +
geom_bar( stat = "identity") +
theme( axis.text.x = element_blank()) +
ylim( c(0,40000))
```
绘图频率排序
```{r}
ggplot(d2,aes( x=reorder(end_station_name,-freq),y= freq)) +
geom_bar( stat = "identity") +
theme(axis.text.x = element_blank()) +
ylim( c(0,40000))+
labs( title = "end station by freq",x = "Station Name")
使用帕累托叠加绘制
```{r}
ggplot(d2,y= freq)) +
geom_bar( stat = "identity") + theme(axis.text.x = element_blank()) +
ggQC::stat_pareto( point.color = "red",point.size = 0.5) +
labs( title = "end station by freq",x = "Station Name")
```
dput(head) 输出
```{r}
> dput(head(d,n=20))
structure(list(end_station_name = c("2112 W Peterson Ave","63rd St
Beach","900 W Harrison St","Aberdeen St & Jackson Blvd","Aberdeen St &
Monroe St","Aberdeen St & Randolph St","Ada St & 113th St","Ada St &
Washington Blvd","Adler Planetarium","Albany Ave & 26th St","Albany Ave &
BloomingDale Ave","Albany Ave & Montrose Ave","Archer (damen) Ave & 37th St","Artesian Ave & Hubbard St","Ashland Ave & 13th St","Ashland Ave &
50th St","Ashland Ave & 63rd St","Ashland Ave & 66th St","Ashland Ave &
69th St","Ashland Ave & 73rd St"),freq = c(1032L,2524L,3836L,8383L,6587L,6136L,18L,6281L,12050L,397L,2833L,1875L,710L,1879L,2659L,151L,112L,102L,78L,8L)),row.names = c(NA,-20L),class =
c("tbl_df","tbl","data.frame"))
```
正如您所看到的,帕累托图适用于右手比例,但左手却异常古怪。虽然有 300 万行,但 y 轴上的缩放已将频率降低到底部的一条非常小的曲线,但在左侧很难看到。
如何将左 y 轴固定到大约 40,000,以便正确显示频率曲线?
解决方法
这是一个解决方案,但不适用于包 ggQC
,带有 sec_axis
。
诀窍是预先计算 max(freq)
,然后将其用作比例因子以对齐两个轴。此数据准备代码的灵感来自此 rstudio-pubs blog post。
library(ggplot2)
library(dplyr)
M <- max(d$freq)
d %>%
arrange(desc(freq)) %>%
mutate(cum_freq = cumsum(freq/sum(freq))) %>%
ggplot(aes(x = reorder(end_station_name,-freq),y = freq)) +
geom_bar(stat = "identity") +
geom_line(mapping = aes(y = cum_freq*M,group = 1)) +
geom_point(
mapping = aes(y = cum_freq*M),color = "red",size = 0.5
) +
scale_y_continuous(
sec.axis = sec_axis(~ ./M,labels = scales::percent,name = "Cummulative percentage")) +
labs( title = "end station by freq",x = "Station Name") +
theme(axis.text.x = element_blank())
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。