如何解决无法将文本文件转换为 TFRecord 数据集
我正在尝试将我的数据集转换为 TFRecord 格式。我创建了一个包含句子的文本文件,每行一行:
DATA_DIR = 'E:/'
sentences_file = os.path.join(DATA_DIR,'data.txt')
我创建了另一个包含令牌的文本文件,每个都在一行中:
vocab_file = os.path.join(DATA_DIR,'tokens.txt')
我想将这些数据转换为 TFRecords 数据集:
import tensorflow as tf
import os
from tensorflow.python.ops import lookup_ops
#lookup table,converts a token to integer. By default returns token at first line of `tokens.txt`
#Requires to be initialized using tf.tables_initializer inside a session.
vocab_table = lookup_ops.index_table_from_file(vocab_file,default_value=0)
#Creates a dataset which retruns a single sentence
dataset = tf.data.TextLineDataset(sentences_file)
#Converts each sentence to a list of tokens
dataset = dataset.map(lambda sentence: tf.string_split([sentence]).values)
#Converts list of tokens to list of token integers
dataset = dataset.map(lambda words: vocab_table.lookup(words))
#Adds length of sentence (number of tokens)
dataset = dataset.map(lambda words: (words,tf.size(words)))
#Convert to a batch of size 32. Padded batch appends 0 for shorter sentences.
dataset = dataset.padded_batch(batch_size=32,padded_shapes=(tf.TensorShape([None]),tf.TensorShape([])))
# Dataset iterator. Needs to be initialized
iterator = dataset.make_initializable_iterator()
但是,我收到以下错误:
C:\ProgramData\Anaconda3\python.exe "E:/untitled1/dfgd.py"
2021-02-17 09:55:33.833498: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-02-17 09:55:33.833695: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-17 09:55:36.692747: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices,tf_xla_enable_xla_devices not set
2021-02-17 09:55:36.693923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-02-17 09:55:36.708389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 210 computeCapability: 1.2
coreClock: 1.402GHz coreCount: 1 deviceMemorySize: 1.00GiB deviceMemoryBandwidth: 7.45GiB/s
2021-02-17 09:55:36.709175: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-02-17 09:55:36.709739: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2021-02-17 09:55:36.710395: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2021-02-17 09:55:36.710948: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2021-02-17 09:55:36.711486: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2021-02-17 09:55:36.712111: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-02-17 09:55:36.712650: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2021-02-17 09:55:36.713184: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-02-17 09:55:36.713351: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-02-17 09:55:36.714741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-17 09:55:36.714937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2021-02-17 09:55:36.715040: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices,tf_xla_enable_xla_devices not set
2021-02-17 09:56:01.485573: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at lookup_table_init_op.cc:144 : Failed precondition: HashTable has different value for same key. Key method has 0 and trying to add value 6
Traceback (most recent call last):
File "E:/untitled1/dfgd.py",line 13,in <module>
vocab_table = lookup_ops.index_table_from_file(vocab_file,default_value=0)
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 1452,in index_table_from_file
table = StaticHashTableV1(init,default_value)
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 314,in __init__
super(StaticHashTable,self).__init__(default_value,initializer)
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 185,in __init__
self._init_op = self._initialize()
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 188,in _initialize
return self._initializer.initialize(self)
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\lookup_ops.py",line 744,in initialize
-1 if self._vocab_size is None else self._vocab_size,self._delimiter)
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\ops\gen_lookup_ops.py",line 362,in initialize_table_from_text_file_v2
_ops.raise_from_not_ok_status(e,name)
File "C:\Users\DSP\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\ops.py",line 6862,in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code,message),None)
File "<string>",line 3,in raise_from
tensorflow.python.framework.errors_impl.FailedPreconditionError: HashTable has different value for same key. Key method has 0 and trying to add value 6 [Op:InitializeTableFromTextFileV2]
我该如何解决这个问题?
解决方法
用于将文本转换为 Tfrecord 的工作示例代码片段
import tensorflow as tf
sentence_list = tf.train.BytesList(value=[b'sentence1',b'sentence2'])
token_list = tf.train.FloatList(value=[1.0,2.0])
sentences = tf.train.Feature(bytes_list=sentence_list)
tokens = tf.train.Feature(float_list=token_list)
sentence_dict = {
'sentence': sentences,'Token': tokens
}
feature_sentence = tf.train.Features(feature=sentence_dict)
example = tf.train.Example(features=feature_sentence)
with tf.io.TFRecordWriter('sentences.tfrecord') as writer:
writer.write(example.SerializeToString())
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。