无法将文本文件转换为 TFRecord 数据集

如何解决无法将文本文件转换为 TFRecord 数据集

我正在尝试将我的数据集转换为 TFRecord 格式。我创建了一个包含句子的文本文件,每行一行:

DATA_DIR = 'E:/'
sentences_file = os.path.join(DATA_DIR,'data.txt')

我创建了另一个包含令牌的文本文件,每个都在一行中:

vocab_file = os.path.join(DATA_DIR,'tokens.txt')

我想将这些数据转换为 TFRecords 数据集:

import tensorflow as tf

import os
from tensorflow.python.ops import lookup_ops
#lookup table,converts a token to integer. By default returns token at first line of `tokens.txt`
#Requires to be initialized using tf.tables_initializer inside a session.
vocab_table = lookup_ops.index_table_from_file(vocab_file,default_value=0)

#Creates a dataset which retruns a single sentence
dataset = tf.data.TextLineDataset(sentences_file)

#Converts each sentence to a list of tokens
dataset = dataset.map(lambda sentence: tf.string_split([sentence]).values)

#Converts list of tokens to list of token integers
dataset = dataset.map(lambda words: vocab_table.lookup(words))

#Adds length of sentence (number of tokens)
dataset = dataset.map(lambda words: (words,tf.size(words)))

#Convert to a batch of size 32. Padded batch appends 0 for shorter sentences.
dataset = dataset.padded_batch(batch_size=32,padded_shapes=(tf.TensorShape([None]),tf.TensorShape([])))


# Dataset iterator. Needs to be initialized
iterator = dataset.make_initializable_iterator()

但是,我收到以下错误:

C:\ProgramData\Anaconda3\python.exe "E:/untitled1/dfgd.py"
2021-02-17 09:55:33.833498: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-02-17 09:55:33.833695: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-17 09:55:36.692747: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices,tf_xla_enable_xla_devices not set
2021-02-17 09:55:36.693923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-02-17 09:55:36.708389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce 210 computeCapability: 1.2
coreClock: 1.402GHz coreCount: 1 deviceMemorySize: 1.00GiB deviceMemoryBandwidth: 7.45GiB/s
2021-02-17 09:55:36.709175: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-02-17 09:55:36.709739: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2021-02-17 09:55:36.710395: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2021-02-17 09:55:36.710948: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-02-17 09:55:36.711486: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-02-17 09:55:36.712111: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-02-17 09:55:36.712650: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2021-02-17 09:55:36.713184: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-02-17 09:55:36.713351: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-02-17 09:55:36.714741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-17 09:55:36.714937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2021-02-17 09:55:36.715040: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices,tf_xla_enable_xla_devices not set
2021-02-17 09:56:01.485573: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at lookup_table_init_op.cc:144 : Failed precondition: HashTable has different value for same key. Key method has 0 and trying to add value 6
Traceback (most recent call last):
  File "E:/untitled1/dfgd.py",line 13,in <module>
    vocab_table = lookup_ops.index_table_from_file(vocab_file,default_value=0)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 1452,in index_table_from_file
    table = StaticHashTableV1(init,default_value)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 314,in __init__
    super(StaticHashTable,self).__init__(default_value,initializer)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 185,in __init__
    self._init_op = self._initialize()
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 188,in _initialize
    return self._initializer.initialize(self)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 744,in initialize
    -1 if self._vocab_size is None else self._vocab_size,self._delimiter)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\gen_lookup_ops.py",line 362,in initialize_table_from_text_file_v2
    _ops.raise_from_not_ok_status(e,name)
  File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\ops.py",line 6862,in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code,message),None)
  File "<string>",line 3,in raise_from
tensorflow.python.framework.errors_impl.FailedPreconditionError: HashTable has different value for same key. Key method has 0 and trying to add value 6 [Op:InitializeTableFromTextFileV2]

我该如何解决这个问题?

解决方法

用于将文本转换为 Tfrecord 的工作示例代码片段

import tensorflow as tf
sentence_list = tf.train.BytesList(value=[b'sentence1',b'sentence2'])
token_list = tf.train.FloatList(value=[1.0,2.0])

sentences = tf.train.Feature(bytes_list=sentence_list)
tokens = tf.train.Feature(float_list=token_list)

sentence_dict = {
  'sentence': sentences,'Token': tokens
}
feature_sentence = tf.train.Features(feature=sentence_dict)

example = tf.train.Example(features=feature_sentence)

with tf.io.TFRecordWriter('sentences.tfrecord') as writer:
  writer.write(example.SerializeToString())

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res