主題

狗頭人哨兵受身2

夏洛爾 | 2022-11-25 01:20:02 | 巴幣 0 | 人氣 182

Kobold Sentinel GetUp v2

實驗目標:

1.扣分制

2.快速進入站立瞬間

3.站立瞬間不限制視線方向 (視線瞄準將由站立處理)

4.啟用Take Actions between Decision

5.Size 在 1-2之前隨機，Size會影響mass和JointDrive

6.ML為Release19，Unity為2021.3.11f1

(New) 7.訓練時間 (對飛出場外的情況進行了處理)

實驗設計:

(沿用道爾受身16)

1.弱點觸地

AddReward(-0.0001f * koboldBodies[i].damageCoef);life -= 0.005f * koboldBodies[i].damageCoef;

2.//Set: judge.endEpisode = true//Set: judge.episodeLength = 10fif(life <= 0f){if(inferenceMode){}else{// ===Train Get Up===float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outLife++;judge.Reset();return;}}else if(koboldRoot.localPosition.y < -10f){if(inferenceMode){}else{//===Train Get Up===float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outY++;//===All Required===judge.Reset();return;}}targetSmoothPosition = targetPositionBuffer.GetSmoothVal();headDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldHeadRb.position);spineDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldSpine.position);rootDir = targetSmoothPosition - stageBase.InverseTransformPoint(koboldRootRb.position);lookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.up, headDir));upAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(koboldHead.right * -1f, Vector3.up));spineLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldSpine.up, spineDir));spineUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldSpine.right * -1f, Vector3.up));rootLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldRoot.forward, rootDir));rootUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(koboldRoot.up, Vector3.up));leftThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldLeftThigh.right, Vector3.up));leftCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldLeftCalf.right, Vector3.up));rightThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldRightThigh.right, Vector3.up));rightCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(koboldRightCalf.right, Vector3.up));avgVelocity = velocityBuffer.GetSmoothVal();flatVelocity = avgVelocity;flatVelocity.y = 0f;velocityCoef = Mathf.InverseLerp(0f, 10f, flatVelocity.magnitude );Reward -1 + angleslastReward = (upAngle + spineUpAngle + rootUpAngle) * 0.00033f+ (lookAngle + spineLookAngle + rootLookAngle) * 0.000133f+ (leftThighAngle + leftCalfAngle + rightThighAngle + rightCalfAngle) * 0.0001f+ (1f - velocityCoef) * 0.00018f+ (1f - exertionRatio) * 0.00002f - 0.002f;totalReward += lastReward;AddReward( lastReward );if(hasLanding && !weaknessOnGround && velocityCoef < 0.2f && upAngle > 0.9f && spineUpAngle > 0.9f && rootUpAngle > 0.9f&& leftThighAngle > 0.9f && leftCalfAngle > 0.9f && rightThighAngle > 0.9f && rightCalfAngle > 0.9f){//===Train Get Up===AddReward(1f);judge.survived++;judge.Reset();return;}

實驗時間:

Step: 5e7

Time Elapsed: 49066s (13.6hr)

實驗結果:

結果顯示為成功，狗頭人哨兵能有效率的受身並進入 "站立瞬間"，而且相當非常有效率

並且不管縮放比例都能有近乎同質的動作和效率

坦白說有效率有點過頭了，幾乎就是觸地之後，雙腿一張尾巴頂一下就瞬間達成受身了

本次訓練除了修正前個版本Bug疏漏部分

另外對出生就因為隨機衝力和Ragdoll橡皮筋效應飛出場外的Agent，進行了一旦距離超過500就Reset的處理，避免無謂的飛行浪費訓練時間

然而結果顯示並沒有比較快，但是受身效率變好了

這裡推測是原本飛行期間其實也會消耗訓練steps，但是對於受身沒意義而已，所以時間並沒有縮短

但是更多steps被用在有意義的地方，所以受身能力變強了