TY - JOUR
T1 - ROBB
T2 - Recurrent Proximal Policy Optimization Reinforcement Learning for Optimal Block Formation in Bitcoin Blockchain Network
AU - Dutta, Amit
AU - Rafin, Nafiz Imtiaz
AU - Dewan, M. Ali Akber
AU - Alam, Md Golam Rabiul
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Blockchain is a ground-breaking technology that has changed how we manage and store protected data. It is a decentralized ledger that enables safe, open, and unchangeable record-keeping. It relies on a distributed network of nodes rather than a single central authority to check and verify transactions, guaranteeing that each entry is correct and unchangeable. Transactions in a blockchain network are grouped into blocks, which are then linked together in a chronological and immutable chain. Block size is a critical parameter in blockchain technology, which refers to the maximum size of each block in the chain that is not benchmarked yet. However, we cannot just change the block size of the blockchain. It is challenging and will create security issues. The Block size is crucial because it affects the number of transactions processed per second, the confirmation time, and overall network efficiency. The confirmation time should be faster to ensure stable earnings for the miners. Moreover, it needs help with broader applications due to high transaction fees and long verification times. We have proposed a reinforcement learning model named ROBB that can efficiently create a block considering the current network state and previous transactions. At first, the problem was converted into a reinforcement learning environment to solve using multiple reinforcement algorithms. We developed a blockchain simulator to replicate the network environment. To transform it into a reinforcement learning environment, we integrated it with OpenAI Gym. The simulator was trained by generating random transactions. Finally, we designed a reward function that enables the simulator to hold transactions and create blocks with the pending transactions when it determines that the environment is favorable. In the final results, ROBB successfully minimized the waiting time for transactions and utilized the blocks to their full potential. Additionally, it optimized the block space, building upon the findings of previous researchers. From the research, we can see that our proposed models show impressive results with 100% block utilization and 1.8s average waiting time while creating the least number of blocks.
AB - Blockchain is a ground-breaking technology that has changed how we manage and store protected data. It is a decentralized ledger that enables safe, open, and unchangeable record-keeping. It relies on a distributed network of nodes rather than a single central authority to check and verify transactions, guaranteeing that each entry is correct and unchangeable. Transactions in a blockchain network are grouped into blocks, which are then linked together in a chronological and immutable chain. Block size is a critical parameter in blockchain technology, which refers to the maximum size of each block in the chain that is not benchmarked yet. However, we cannot just change the block size of the blockchain. It is challenging and will create security issues. The Block size is crucial because it affects the number of transactions processed per second, the confirmation time, and overall network efficiency. The confirmation time should be faster to ensure stable earnings for the miners. Moreover, it needs help with broader applications due to high transaction fees and long verification times. We have proposed a reinforcement learning model named ROBB that can efficiently create a block considering the current network state and previous transactions. At first, the problem was converted into a reinforcement learning environment to solve using multiple reinforcement algorithms. We developed a blockchain simulator to replicate the network environment. To transform it into a reinforcement learning environment, we integrated it with OpenAI Gym. The simulator was trained by generating random transactions. Finally, we designed a reward function that enables the simulator to hold transactions and create blocks with the pending transactions when it determines that the environment is favorable. In the final results, ROBB successfully minimized the waiting time for transactions and utilized the blocks to their full potential. Additionally, it optimized the block space, building upon the findings of previous researchers. From the research, we can see that our proposed models show impressive results with 100% block utilization and 1.8s average waiting time while creating the least number of blocks.
KW - Dynamic block size
KW - OpenAIGym
KW - blockchain
KW - proximal policy optimization
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85186759498&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3369896
DO - 10.1109/ACCESS.2024.3369896
M3 - Journal Article
AN - SCOPUS:85186759498
VL - 12
SP - 31287
EP - 31311
JO - IEEE Access
JF - IEEE Access
ER -