ETH官方钱包

前往
大廳
主題

道爾受身16

夏洛爾 | 2022-10-29 13:30:10 | 巴幣 0 | 人氣 191


進(jìn)行追逐研究時發(fā)現(xiàn)幾件事
1.原本的座標(biāo)系統(tǒng)和觀察項(xiàng)又有疏忽Scale問題,可能導(dǎo)致道爾在觀察座標(biāo)和速度的尺度會難以對上,導(dǎo)致學(xué)習(xí)效果欠佳
2.Unity Joint系統(tǒng),除非啟用enable collision,否則父子關(guān)節(jié)不會碰撞,祖孫則不再此限,但也因此其實(shí)父子關(guān)節(jié)是可以在碰撞框有overlap的,以往以為會影響就都避免overlap,反而導(dǎo)致劍的軌跡無法理想的畫上去
3.關(guān)於視線系統(tǒng),發(fā)現(xiàn)目前其實(shí)完全用不到,因?yàn)闀簳r不會放置需要視線的環(huán)境,而如果需要視線也必定要重新訓(xùn)練,因此決定暫時移除視線系統(tǒng),等到視線環(huán)境的訓(xùn)練時再附加,以便讓當(dāng)前研究更容易推進(jìn)
4.關(guān)於Take Actions between Decision,因?yàn)槟壳癛eward都寫在Fixed Update,推測有可能不啟用的話,在Decision Interval > 1的情況得分未必能正確計(jì)算

Doyle GetUp v16
實(shí)驗(yàn)?zāi)繕?biāo):
1.扣分制
2.快速進(jìn)入站立瞬間
3.站立瞬間不限制視線方向 (視線瞄準(zhǔn)將由站立處理)
4.啟用Take Actions between Decision

實(shí)驗(yàn)設(shè)計(jì):
1.弱點(diǎn)觸地
AddReward(-0.0001f * doyleRootBody.damageCoef);life -= 0.005f * doyleRootBody.damageCoef;

//這裡其實(shí)失誤了,劍和尾巴都沒有啟用弱點(diǎn)
2.
if(life <= 0f){if(inferenceMode){}else{float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outLife++;judge.Reset();return;}}else if(doyleRoot.localPosition.y < -10f){if(inferenceMode){}else{float survivedTime = Time.fixedTime - arrivedMoment;if(survivedTime < judge.episodeLength ){AddReward( (survivedTime - judge.episodeLength) * 0.1f );}judge.outY++;judge.Reset();return;}}targetSmoothPosition = targetPositionBuffer.GetSmoothVal();headDir = targetSmoothPosition - stageBase.InverseTransformPoint(doyleHeadRb.position);spineDir = targetSmoothPosition - stageBase.InverseTransformPoint(doyleSpine.position);rootDir = targetSmoothPosition - stageBase.InverseTransformPoint(doyleRootRb.position);lookAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(doyleHead.up, headDir));upAngle = Mathf.InverseLerp(180f, 0f, Vector3.Angle(doyleHead.right * -1f, Vector3.up));spineLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleSpine.up, spineDir));spineUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleSpine.right * -1f, Vector3.up));rootLookAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleRoot.forward, rootDir));rootUpAngle = Mathf.InverseLerp(180f, 30f, Vector3.Angle(doyleRoot.up, Vector3.up));leftThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleLeftThigh.right, Vector3.up));leftCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleLeftCalf.right, Vector3.up));rightThighAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleRightThigh.right, Vector3.up));rightCalfAngle = Mathf.InverseLerp(180f, 45f, Vector3.Angle(doyleRightCalf.right, Vector3.up));avgVelocity = velocityBuffer.GetSmoothVal();flatVelocity = avgVelocity;flatVelocity.y = 0f;velocityCoef = Mathf.InverseLerp(0f, 10f, flatVelocity.magnitude );//Reward -1 + angleslastReward = (upAngle + spineUpAngle + rootUpAngle) * 0.00033f+ (lookAngle + spineLookAngle + rootLookAngle) * 0.000133f+ (leftThighAngle + leftCalfAngle + rightThighAngle + rightCalfAngle) * 0.0001f+ (1f - velocityCoef) * 0.00018f+ (1f - exertionRatio) * 0.00002f - 0.002f;totalReward += lastReward;AddReward( lastReward );if(hasLanding && !weaknessOnGround && velocityCoef < 0.2f && upAngle > 0.9f && spineUpAngle > 0.9f && rootUpAngle > 0.9f&& leftThighAngle > 0.9f && leftCalfAngle > 0.9f && rightThighAngle > 0.9f && rightCalfAngle > 0.9f){//===Train Get Up===AddReward(1f);judge.survived++;judge.Reset();return;}
3.啟用Take Actions between Decision

//大致來說
--1.根據(jù)頭胸腹和左右大腿小腿的"向上角度為主"來引導(dǎo)進(jìn)入站立瞬間
--2.以扣分制引導(dǎo),弱點(diǎn)肢體觸地會處罰並耗損生命,當(dāng)生命歸零會根據(jù)存活時間反比扣分並結(jié)束,進(jìn)入站立瞬間則會得分並結(jié)束
--3.存活時間的反比係數(shù),大於肢體角度和觸地處罰的最大扣分,藉此來避免快速自盡
--4.站立瞬間相較以往需要雙腳觸地,改為水平速度需小於2m/s (為何不限制垂直速度的理由為擔(dān)心抑制起身)

實(shí)驗(yàn)時間:
Step: 5e7
Time Elapsed: 31027s (8.61hr)

實(shí)驗(yàn)結(jié)果:
結(jié)果顯示為成功,道爾能有效率的受身並進(jìn)入 "站立瞬間",而且極端的有效率

尾巴和劍忘記設(shè)為弱點(diǎn)是失誤,但令人意外的並沒有被濫用的現(xiàn)象
反而是幾乎都是我喜歡的靠手臂起身


下個實(shí)驗(yàn)將進(jìn)行道爾靜立,實(shí)驗(yàn)設(shè)計(jì)預(yù)計(jì)為
1.獎勵瞄準(zhǔn)方向
2.獎勵抑制全身速度和角速度
3.獎勵雙腳觸地

更多創(chuàng)作