如何解决Pyspark Executor 日志未记录在独立集群的本地文件中
我正在尝试使用独立的 Pyspark 集群设置记录在工作节点中生成的所有自定义日志。 所有驱动程序日志都被正确记录,但没有为执行程序保存日志。 我在代码中遗漏了什么吗?
log4j.properties
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=./log_files/Analytics.Service/spark.log
log4j.appender.RollingAppender.File.ImmediateFlush=true
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=./log_files/Analytics.Service/Analytics.Service.log
log4j.appender.RollingAppenderU.File.ImmediateFlush=true
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.rootLogger=ALL,RollingAppender
log4j.logger.myLogger=ALL,RollingAppenderU
log4j.logger.spark.storage=ALL,RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=ALL,RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=ALL,RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=ALL,RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=ALL,RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=ALL,RollingAppender
log4j.additivty.spark.MapOutputTracker=false
spark-defaults.conf
spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir ./log_files/Analytics.Service/
spark.executor.extrajavaoptions -XX:+PrintGCDetails -Dlog4j.configuration=file:log4j.properties -Dlog4j.debug=true
spark.driver.extrajavaoptions -XX:+PrintGCDetails -Dlog4j.configuration=file:log4j.properties -Dlog4j.debug=true
test_logger.py
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext,SparkConf
def driver(row,custom_logr):
item = row*row
custom_logr.warn("In Driver")
def main():
sc = SparkContext.getorCreate(SparkConf().setMaster('local[*]'))
root_logr = sc._jvm.org.apache.log4j.LogManager.getRootLogger()
custom_logr = sc._jvm.org.apache.log4j.LogManager.getLogger("myLogger")
custom_logr.warn("In Main")
lst = [1,2,3,4,5,6,7,8,9]
data = sc.parallelize(lst)
data.map(lambda row: driver(row,custom_logr))
custom_logr.warn("Process Finished")
sc.stop()
main()
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。