主題

道爾受身16

夏洛爾 | 2022-10-29 13:30:10 | 巴幣 0 | 人氣 191

進(jìn)行追逐研究時發(fā)現(xiàn)幾件事

1.原本的座標(biāo)系統(tǒng)和觀察項(xiàng)又有疏忽Scale問題，可能導(dǎo)致道爾在觀察座標(biāo)和速度的尺度會難以對上，導(dǎo)致學(xué)習(xí)效果欠佳

2.Unity Joint系統(tǒng)，除非啟用enable collision，否則父子關(guān)節(jié)不會碰撞，祖孫則不再此限，但也因此其實(shí)父子關(guān)節(jié)是可以在碰撞框有overlap的，以往以為會影響就都避免overlap，反而導(dǎo)致劍的軌跡無法理想的畫上去

3.關(guān)於視線系統(tǒng)，發(fā)現(xiàn)目前其實(shí)完全用不到，因?yàn)闀簳r不會放置需要視線的環(huán)境，而如果需要視線也必定要重新訓(xùn)練，因此決定暫時移除視線系統(tǒng)，等到視線環(huán)境的訓(xùn)練時再附加，以便讓當(dāng)前研究更容易推進(jìn)

4.關(guān)於Take Actions between Decision，因?yàn)槟壳癛eward都寫在Fixed Update，推測有可能不啟用的話，在Decision Interval > 1的情況得分未必能正確計(jì)算

Doyle GetUp v16

實(shí)驗(yàn)?zāi)繕?biāo):

1.扣分制

2.快速進(jìn)入站立瞬間

3.站立瞬間不限制視線方向 (視線瞄準(zhǔn)將由站立處理)

4.啟用Take Actions between Decision

實(shí)驗(yàn)設(shè)計(jì):

1.弱點(diǎn)觸地

AddReward(-0.0001f * doyleRootBody.damageCoef);life -= 0.005f * doyleRootBody.damageCoef;

//這裡其實(shí)失誤了，劍和尾巴都沒有啟用弱點(diǎn)

2.

if(life <= 0f){if(inferenceMode){}else{float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outLife++;judge.Reset();return;}}else if(doyleRoot.localPosition.y < -10f){if(inferenceMode){}else{float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outY++;judge.Reset();return;}}targetSmoothPosition = targetPositionBuffer.GetSmoothVal();headDir = targetSmoothPosition - stageBase.InverseTransformPoint(doyleHeadRb.position);spineDir = targetSmoothPosition - stageBase.InverseTransformPoint(doyleSpine.position);rootDir = targetSmoothPosition - stageBase.InverseTransformPoint(doyleRootRb.position);lookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(doyleHead.up, headDir));upAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(doyleHead.right * -1f, Vector3.up));spineLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleSpine.up, spineDir));spineUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleSpine.right * -1f, Vector3.up));rootLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleRoot.forward, rootDir));rootUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleRoot.up, Vector3.up));leftThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleLeftThigh.right, Vector3.up));leftCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleLeftCalf.right, Vector3.up));rightThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleRightThigh.right, Vector3.up));rightCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleRightCalf.right, Vector3.up));avgVelocity = velocityBuffer.GetSmoothVal();flatVelocity = avgVelocity;flatVelocity.y = 0f;velocityCoef = Mathf.InverseLerp(0f, 10f, flatVelocity.magnitude );//Reward -1 + angleslastReward = (upAngle + spineUpAngle + rootUpAngle) * 0.00033f+ (lookAngle + spineLookAngle + rootLookAngle) * 0.000133f+ (leftThighAngle + leftCalfAngle + rightThighAngle + rightCalfAngle) * 0.0001f+ (1f - velocityCoef) * 0.00018f+ (1f - exertionRatio) * 0.00002f - 0.002f;totalReward += lastReward;AddReward( lastReward );if(hasLanding && !weaknessOnGround && velocityCoef < 0.2f && upAngle > 0.9f && spineUpAngle > 0.9f && rootUpAngle > 0.9f&& leftThighAngle > 0.9f && leftCalfAngle > 0.9f && rightThighAngle > 0.9f && rightCalfAngle > 0.9f){//===Train Get Up===AddReward(1f);judge.survived++;judge.Reset();return;}

3.啟用Take Actions between Decision

//大致來說

--1.根據(jù)頭胸腹和左右大腿小腿的"向上角度為主"來引導(dǎo)進(jìn)入站立瞬間

--2.以扣分制引導(dǎo)，弱點(diǎn)肢體觸地會處罰並耗損生命，當(dāng)生命歸零會根據(jù)存活時間反比扣分並結(jié)束，進(jìn)入站立瞬間則會得分並結(jié)束

--3.存活時間的反比係數(shù)，大於肢體角度和觸地處罰的最大扣分，藉此來避免快速自盡

--4.站立瞬間相較以往需要雙腳觸地，改為水平速度需小於2m/s (為何不限制垂直速度的理由為擔(dān)心抑制起身)

實(shí)驗(yàn)時間:

Step: 5e7

Time Elapsed: 31027s (8.61hr)

實(shí)驗(yàn)結(jié)果:

結(jié)果顯示為成功，道爾能有效率的受身並進(jìn)入 "站立瞬間"，而且極端的有效率

尾巴和劍忘記設(shè)為弱點(diǎn)是失誤，但令人意外的並沒有被濫用的現(xiàn)象