Kobold Sentinel GetUp v1
實(shí)驗(yàn)?zāi)繕?biāo):
1.扣分制
2.快速進(jìn)入站立瞬間
3.站立瞬間不限制視線方向 (視線瞄準(zhǔn)將由站立處理)
4.啟用Take Actions between Decision
(New) 5.Size 在 1-2之前隨機(jī),Size會影響mass和JointDrive
(New) 6.ML為Release19,Unity為2021.3.11f1
實(shí)驗(yàn)設(shè)計(jì):
(沿用道爾受身16)
1.弱點(diǎn)觸地
AddReward(-0.0001f * koboldBodies[i].damageCoef);life -= 0.005f * koboldBodies[i].damageCoef;
2.//Set: judge.endEpisode = true//Set: judge.episodeLength = 10fif(life <= 0f){if(inferenceMode){}else{// ===Train Get Up===float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outLife++;judge.Reset();return;}}else if(koboldRoot.localPosition.y < -10f){if(inferenceMode){}else{//===Train Get Up===float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outY++;//===All Required===judge.Reset();return;}}targetSmoothPosition = targetPositionBuffer.GetSmoothVal();headDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldHeadRb.position);spineDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldSpine.position);rootDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldRootRb.position);lookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.up, headDir));upAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.right * -1f, Vector3.up));spineLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldSpine.up, spineDir));spineUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldSpine.right * -1f, Vector3.up));rootLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldRoot.forward, rootDir));rootUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldRoot.up, Vector3.up));leftThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldLeftThigh.right, Vector3.up));leftCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldLeftCalf.right, Vector3.up));rightThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldRightThigh.right, Vector3.up));rightCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldRightCalf.right, Vector3.up));avgVelocity = velocityBuffer.GetSmoothVal();flatVelocity = avgVelocity;flatVelocity.y = 0f;velocityCoef = Mathf.InverseLerp(0f, 10f, flatVelocity.magnitude );Reward -1 + angleslastReward = (upAngle + spineUpAngle + rootUpAngle) * 0.00033f+ (lookAngle + spineLookAngle + rootLookAngle) * 0.000133f+ (leftThighAngle + leftCalfAngle + rightThighAngle + rightCalfAngle) * 0.0001f+ (1f - velocityCoef) * 0.00018f+ (1f - exertionRatio) * 0.00002f - 0.002f;totalReward += lastReward;AddReward( lastReward );if(hasLanding && !weaknessOnGround && velocityCoef < 0.2f && upAngle > 0.9f && spineUpAngle > 0.9f && rootUpAngle > 0.9f&& leftThighAngle > 0.9f && leftCalfAngle > 0.9f && rightThighAngle > 0.9f && rightCalfAngle > 0.9f){//===Train Get Up===AddReward(1f);judge.survived++;judge.Reset();return;}
實(shí)驗(yàn)時間:
Step: 5e7
Time Elapsed: 47884s (13.3hr)
實(shí)驗(yàn)結(jié)果:
結(jié)果顯示為成功,狗頭人哨兵能有效率的受身並進(jìn)入 "站立瞬間",而且相當(dāng)?shù)挠行?/div>
並且不管縮放比例都能有近乎同質(zhì)的動作和效率
但是本次尾巴有設(shè)定成弱點(diǎn),卻反而有明顯大量使用尾巴的傾向
另外相較道爾受身16僅用8.6hr完成,狗頭人哨兵使用了1.5倍的時間
根據(jù)觀測有可能是打擊力設(shè)定的太高,部分打擊情況會造成狗頭人飛出場外,成為浪費(fèi)時間的訓(xùn)練組
下個實(shí)驗(yàn)將進(jìn)行狗頭人哨兵靜立,實(shí)驗(yàn)設(shè)計(jì)預(yù)計(jì)為沿用道爾靜立
1.獎勵瞄準(zhǔn)方向
2.獎勵抑制全身速度和角速度
3.獎勵雙腳觸地
警告:
在縮放比例部分有Bug,放大時僅JointDrive中的Spring和Damper有加成,但是Maximum Force沒有
新版程式已修正,由於人物未必會觸及Maximum Force,目前測試上狗頭人哨兵模型仍然能效率受身
但未來根據(jù)情況還是必須考慮重新訓(xùn)練受身